Session

My Data is Spilling! How to Optimize Jobs in Databricks

Have you ever faced slow or inefficient Spark jobs in Databricks? Data spilling is one of the main problems behind delays in big data processing, and in this session, we’ll explore what data spilling is, why it happens, and how it impacts the performance of your jobs. No matter if you’re a beginner or an experienced Spark user, this session will get you on track to follow up on the content provided at the same time as valuable insights are shared.

You will understand the concept of data spilling and how to identify it, exploring optimization techniques, including memory tuning and partition configuration, and more.

Rui Carvalho

Data Engineer at Devscope

Porto, Portugal

Actions

Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.

Jump to top