Dustin Vannoy

Specialist - Data Engineering/DevX at Databricks

San Diego, California, United States

Actions

Dustin Vannoy is a Data Engineer with experience solving business problems with analytics and big data solutions. He is passionate about all aspects of data engineering, especially building data platforms and streaming data pipelines. He currently focuses on building data platforms and pipelines in Apache Spark / Databricks, Kafka, Python, and Scala. He is co-founder of the Data Engineering San Diego meetup and encourages others to grow their data skills by making tutorials, mentoring others, and speaking at events.

Area of Expertise

Information & Communications Technology

Topics

Apache Spark
Apache Kafka
Azure
Data Engineering
Data Lakes
Big Data

dbt + Databricks: SQL based ELT

Using SQL for data transformation is a powerful way to empower an analytics team to create their own optimized data model. However, applying best practices like version control and data tests is often skipped. dbt is an open source tool to apply engineering best practices to SQL based data transformations, giving you more confidence in your ELT pipeline.

This talk provides an introduction to how dbt helps with SQL based ETL and guidance on using dbt with Databricks SQL Warehouse. We will cover patterns using dbt Cloud with Databricks as the processing engine.

Data processing with Databricks SQL

Using SQL for data transformation is a powerful way to empower an analytics team to create their own optimized data model. However, relying on SQL often comes with tradeoffs such as limited functionality, hard to maintain stored procedures, and skipping best practices like version control and data tests. While Azure Databricks is known as a platform for using Apache Spark with big data workloads, it has increased support for SQL workloads over the last few years. Attend this session to hear how Azure Databricks supports SQL for data transformation jobs as a core part of your Lakehouse.

In this session we will cover three options to use Azure Databricks with SQL syntax to create Delta tables.
1. Databricks workflows with Streaming Tables, Materialized Views, and SQL scripts
2. dbt: an open source framework to apply engineering best practices to SQL based data transformations.
3. SQLMesh: an open-core product to easily build high-quality and high-performance data pipelines

You will leave with a better understanding of how your team can use SQL to build complex data transformations for your Lakehouse and still have confidence in the reliability and scalability of your ETL pipeline.

Azure Data Engineer Skills for Success

Data Engineer is an exciting and rewarding role. However, many are not sure what a data engineer does and which skills are needed. This session will describe a data engineer's responsibilities and give an overview of the skills needed to be an Azure Data Engineer. You will learn about a reference architecture for data engineering in Azure and see demos of key tools an Azure Data Engineer uses. The focus will be on real-world data engineer skills, but you will also highlight which skills are tested in the Azure Data Engineer Associate certification.

Dustin Vannoy

Specialist - Data Engineering/DevX at Databricks

San Diego, California, United States

Actions

Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.

Speaker

Dustin Vannoy

Actions

Links

Area of Expertise

Topics

Sessions

dbt + Databricks: SQL based ELT

Data processing with Databricks SQL

Azure Data Engineer Skills for Success

Dustin Vannoy

Links

Actions