Session

Getting Started with Delta & The Lakehouse

Data Lakes have been around for an age, but they were often a niche, specialist thing. With huge advances in parquet, the delta format and lakehouse approaches, it's suddenly everyone wants to be lake-based... so how do you catch up?

This session runs through a quick recap of why lakes were so difficult, before digging into the Delta Lake format and all of the features it brings. We'll look at what Delta gives you through spark, as well as through managed platforms such as Microsoft Fabric.

We'll spend some time looking at the more advanced features, how we can achieve incremental merges, transactional consistency, temporal rollbacks, file optimisation and some deep and dirty performance tuning with partitioning, Z-ordering and V-ordering

If you’re planning, currently building, or looking after a Data Lake with Spark currently and want to get to the next level of performance and functionality, this session is for you. Never heard of parquet or delta? You’re about to learn a whole lot more!

Simon Whiteley

Data Platform MVP. Databricks Beacon. Cloud Architect, Nerd

London, United Kingdom

Actions

Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.

Jump to top