Session
Harnessing the Power of Apache Spark & Delta Lake in the Microsoft Data Ecosystem
Apache Spark is a powerful distributed compute engine that has become an industry leading solution for data processing. In this session we’ll firstly introduce Apache Spark, exploring its core concepts and capabilities. Then we will discuss how Apache Spark can be implemented using various Microsoft products, including Azure Databricks, Azure Data Factory, Azure Synapse Analytics, and Microsoft Fabric, to build robust, scalable data processes to perform advanced analytics, and drive business insights.
We’ll then explore combining Apache Spark with the open standard Delta Lake to offer a comprehensive solution that addresses both compute and storage aspects, allowing us to create a complete cloud-native data platform. Delta Lake enhances Spark's capabilities by providing ACID transactions, scalable metadata handling for both streaming and batch workloads. This integration ensures data reliability and consistency while enabling efficient large-scale data operations. Together, Apache Spark and Delta Lake facilitate the construction of resilient data pipelines, allowing businesses to leverage their data assets fully and achieve seamless data integration, transformation, and analytics within the Microsoft data ecosystem.
Paul Andrew
Co-Founder & CTO of Cloud Formations | Microsoft MVP
Derby, United Kingdom
Links
Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.
Jump to top