Session

Python Pipeline Primer: Data Engineering with Azure DataBricks

Azure DataBricks brings a Platform-as-a-Service offering of Apache Spark, which allows for blazing fast data processing, interactive querying and the hosting of machine learning models all in one place! But most of the buzz is around what it means for Data Science & AI - what about the humble data engineer who wants to harness the in-memory processing power within their ETL pipelines? How does it fit into the Modern Data Warehouse? What does data preparation look like in this new world?

This session will run through the best practices of implementing Azure DataBricks as your data ingestion, transformation and curation tool of choice. We will:

• Introduce the Azure DataBricks service
• Introduce Python and why it is the language of choice for Data Engineering on DataBricks
• Discuss the various hosting & compute options available
• Demonstrate a sample data processing task
• Compare and contrast against alternative approaches using SSIS, U-SQL and HDInsight
• Demonstrate how to manage and orchestrate your processing pipelines
• Review the wider architectures and additional extension patterns

The session is aimed at Data Engineers & BI Professionals seeking to put the Azure DataBricks technology in the right context and learn how to use the service. We will not be covering the python programming language in detail.

Simon Whiteley

Data Platform MVP. Databricks Beacon. Cloud Architect, Nerd

London, United Kingdom

Actions

Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.

Jump to top