Session
Empowering Lakehouse Solutions with Apache Arrow and Python Notebooks in Microsoft Fabric
When Microsoft Fabric was released, it introduced Apache Spark as its default engine for general data processing. While Spark is powerful for big data scenarios, it’s not always the best solution for many small and medium-sized workloads, due to both costs and performance loss from administrative overhead.
Since then, Microsoft has introduced a non-Spark compute option: Python Notebooks. This makes it easier than ever to build solutions using Python alongside technologies like Apache Arrow, DuckDB, and Polars.
In this session, we’ll dive into how Python workloads leverage Apache Arrow under the hood in Microsoft Fabric to handle data transformations and analytics.
We’ll explore practical examples where Python with Apache Arrow outperforms Spark, and demonstrate how Apache Arrow bridges the gap between high speed and flexible development. Finally, we’ll examine how Apache Arrow fits into a metadata-driven Lakehouse architecture in Microsoft Fabric.
We will cover:
* The difference between Python Notebooks and a Single Node Spark.
* When to use Python Notebooks and when to use Spark Notebooks.
* Where to use Python Notebooks in a meta-driven Lakehouse
* A brief introduction to tooling and moving workload between Python Notebooks and Spark Notebooks.
* How to avoid overload the Lakehouse tech stack with python technologies, with an introduction to Apache Arrow
* Costs
After this session, attendees will have an understanding of how to apply Python Notebooks, as
well as Spark Notebooks, to get the most out of Fabric for data processing.
Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.
Jump to top