Session
Outperform Spark with Python Notebooks in Fabric
When Microsoft Fabric was released, it came with Apache Spark out of the box. Spark's ability to work with more programming languages opened up possibilities for creating data-driven and automated lakehouses. On the other hand, Spark's primary feature to scale out and handle large amounts of data will, in many cases, be over-dimensioned, less performant, and more costly when working with trivial workloads.
With Python Notebooks, we have a better tool for handling metadata, automation, and processing of more trivial workloads, while still having the option to use Spark Notebooks for handling more demanding processing.
We will cover:
* The difference between Python Notebooks and a Single Node Spark cluster, and why Spark Notebooks are more costly and less performant with certain types of workloads.
* When to use Python Notebooks and when to use Spark Notebooks.
* Where to use Python Notebooks in a meta-driven Lakehouse
* A brief introduction to tooling and move workload between Python Notebooks and Spark Notebooks.
* How to avoid overload the Lakehouse tech stack with python technologies.
* Costs
After this session, attendees will have an understanding of how to apply Python Notebooks, as well as Spark Notebooks, to get the most out of a Fabric Capacity for data processing.
Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.
Jump to top