Session
Introduction to Performance Tuning on Azure Databricks
More and more organisations are building data platforms in the cloud, often utilising Spark and tools like Databricks to build data engineering pipelines. These distributed computing tools can can be incredibly powerful, processing huge datasets incredibly quickly, but can have a steep learning curve. Often teams migrating older on-premises data warehouses to cloud solutions like the Lakehouse rightly concentrate on getting good data over getting the best performance. But are you getting the most out of these shiny new tools?
In the cloud, data pipeline performance can have a big impact on monthly cost, making it much easier to to justify spending time getting things running faster and more efficiently. This talk aims to show you the common pain points when working with Spark using Databricks, showing you where to look, what to look for, and what can be done to improve things.
We’ll look at how the architecture of Spark is reflected in the Spark UI, and how to use the UI, along with query plans and the cluster metrics to get a good understanding of if whether you’re wringing all the performance you can out of your cluster, or burning cash on excess compute. We’ll cover a list of quick wins to improve performance, and then look at how to identify some common problems that hurt performance.
By the end of the session, you’ll know how to check if your data pipelines are running well, and if the clusters you have are fit for the job. You’ll hopefully have a few quick ways to improve performance, save some money or both! You’ll also know how monitor performance after making changes, so you can check if they made a difference, and if they did earn some kudos with your team.
50 Minute session, can be adjusted for either Fabric or Synapse SQL Pools
Niall Langley
Data Engineer / Platform Architect
Bristol, United Kingdom
Links
Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.
Jump to top