Miles Cole

Spark and Lakehouse Evangelist | Principal Program Manager @ Microsoft, Fabric CAT

Littleton, Colorado, United States

Actions

Miles is a Principal Program Manager at Microsoft, while a Spark specialist by role, he has deep knowledge and experience with everything lakehouse to data warehouse design, tuning, CI/CD automation, and API development.

He started off his career writing VB script to automate business processes and reporting then eventually Power BI was released and he quickly became shadow IT. After successfully implementing Power BI and starting a BI center of excellence, he was on-boarded into IT and given the opportunity of building his companies’ first cloud data warehouse. Now after dozens of implementations and an ever-growing love of all things Spark and OSS, he loves to share what he's learned and talk modern data architectures.

Area of Expertise

Information & Communications Technology

Topics

Microsoft Fabric
Apache Spark
CICD
Azure DevOps Pipelines
Microsoft Power BI
Azure Data Engineering

Spark Acceleration with the Native Execution Engine and Autotune in Fabric

Learn directly from the Microsoft Fabric Engineering Product Group about two powerful features that enhance Spark performance in your Lakehouse: the Native Execution Engine and Autotune.

The Native Execution Engine improves the speed of Spark jobs using efficient technologies like columnar data formats and vectorized processing, making it faster to process and analyze large datasets. Autotune simplifies Spark configuration by automatically adjusting settings based on your workload, saving time and effort while improving performance.

This session will provide a practical, in-depth look at how these tools work and how they can help you handle demanding data engineering and machine learning tasks. Whether you’re building pipelines, optimizing ETL jobs, or running advanced analytics, you’ll gain actionable insights from the team that developed these features.

Harnessing Apache Spark for Next-Gen Analytics in Microsoft Fabric

Unlock the full potential of Apache Spark in Microsoft Fabric with this comprehensive, full-day workshop. Tailored for data engineers and data developers, this session offers hands-on experience in creating and optimizing Spark workflows in building data analytics platform with medallion architecture by leveraging industry standard Delta Lake. Dive deep into Spark's capabilities in data transformation, parallel processing, job scheduling, and performance tuning, all within the Microsoft Fabric ecosystem. This workshop will empower Spark newcomers or beginners to tackle complex data challenges with confidence and build an AI-ready data analytics platform.

By the end of the workshop, you will be able to:
-Develop Apache Spark-based applications in Microsoft Fabric.
-Utilize Delta Lake and Lakehouse to construct medallion architecture for your data analytics platform.
-Utilize the immersive and rich authoring/development experience with Fabric Notebook and Visual Studio Code - Gain proficiency in writing and executing Spark code within notebooks. Learn useful functions in Notebook for better authoring experience (live versioning, display, notebookutils)
-Use your preferred programming language to build data analytics applications and leverage your existing SQL skills to quickly get started with Spark.
-Manage, monitor, and debug your Spark applications in Microsoft Fabric. Debug spark job with notebook in-context monitoring, Spark details page and OSS Spark UI.
-Discover how to integrate Spark with other Fabric workloads like Data Factory, Data Warehouse, Power BI etc. seamlessly.
-Discover how to leverage public and/or custom libraries to extend the functionality of your Spark applications by using Library Management.
-Extra/Bonus - Learn tips and tricks to optimize your Spark applications and understand how to scale Spark applications to handle large datasets efficiently.

Designing Dedicated Sql Pools for Scale and Performance

Deploying a Dedicated Sql Pool in the cloud is easy, however so is misusing its distributed Sql engine. How do you know you're getting the most DWUs for your Azure spend, could queries run faster, am I writing optimized Sql, how does Synapse even work under the hood?

In this session I'll guide you through the key tenants of building a lightning fast Synapse Dedicated Sql Pool. From table distributions, indexes, statistics, database settings, data movement patterns, to Sql constructs, this session will teach you the options available, pitfalls to avoid, and showcase the impact via industry standard benchmarks.

Miles Cole

Spark and Lakehouse Evangelist | Principal Program Manager @ Microsoft, Fabric CAT

Littleton, Colorado, United States

Actions

Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.

Speaker

Miles Cole

Actions

Links

Area of Expertise

Topics

Sessions

Spark Acceleration with the Native Execution Engine and Autotune in Fabric

Harnessing Apache Spark for Next-Gen Analytics in Microsoft Fabric

Designing Dedicated Sql Pools for Scale and Performance

Miles Cole

Links

Actions