Harnessing the Power of Apache Spark & Delta Lake in the Microsoft Data Ecosystem

Apache Spark is a powerful distributed compute engine that has become an industry leading solution for data processing. In this session we’ll firstly introduce Apache Spark, exploring its core concepts and capabilities. Then we will discuss how Apache Spark can be implemented using various Microsoft products, including Azure Databricks, Azure Data Factory, Azure Synapse Analytics, and Microsoft Fabric, to build robust, scalable data processes to perform advanced analytics, and drive business insights.

We’ll then explore combining Apache Spark with the open standard Delta Lake to offer a comprehensive solution that addresses both compute and storage aspects, allowing us to create a complete cloud-native data platform. Delta Lake enhances Spark's capabilities by providing ACID transactions, scalable metadata handling for both streaming and batch workloads. This integration ensures data reliability and consistency while enabling efficient large-scale data operations. Together, Apache Spark and Delta Lake facilitate the construction of resilient data pipelines, allowing businesses to leverage their data assets fully and achieve seamless data integration, transformation, and analytics within the Microsoft data ecosystem.

Paul Andrew

Co-Founder & CTO of Cloud Formations | Microsoft MVP

Derby, United Kingdom

Actions

View Speaker Profile

Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.

Session

Harnessing the Power of Apache Spark & Delta Lake in the Microsoft Data Ecosystem

Paul Andrew

Links

Actions