Session
Single-node technologies vs. Spark in Microsoft Fabric: Choosing the right tool for the job
In the early days of big data, distributed computing frameworks like Apache Spark and Hadoop became the de facto standards for processing massive datasets. Their ability to handle distributed computing made them essential for tackling large-scale data challenges. However, in today’s diverse data landscape, not all datasets qualify as "big data." For many use cases, single-node processing tools like Polars and DuckDB are proving to be compelling alternatives, offering exceptional performance, simplicity, and lower overhead compared to distributed frameworks.
Microsoft Fabric introduces a unique opportunity to leverage both worlds if necessary. By enabling Python notebooks within its ecosystem, Fabric allows you to build and execute pipelines using these modern single-node technologies. This flexibility ensures you can choose the most efficient tool for your specific workloads.
In this session, we will:
- Examine the evolution of data processing, contrasting distributed frameworks like Spark with single-node solutions.
- Explore how technologies like Polars and DuckDB operate, their strengths, and how they compare to Spark in performance and scalability.
- Evaluate use cases to determine which approach—distributed or single-node—fits best
Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.
Jump to top