Single-node technologies vs. Spark in Microsoft Fabric: Choosing the right tool for the job

In the early days of big data, distributed computing frameworks like Apache Spark and Hadoop became the de facto standards for processing massive datasets. Their ability to handle distributed computing made them essential for tackling large-scale data challenges. However, in today’s diverse data landscape, not all datasets qualify as "big data." For many use cases, single-node processing tools like Polars and DuckDB are proving to be compelling alternatives, offering exceptional performance, simplicity, and lower overhead compared to distributed frameworks.

Microsoft Fabric introduces a unique opportunity to leverage both worlds if necessary. By enabling Python notebooks within its ecosystem, Fabric allows you to build and execute pipelines using these modern single-node technologies. This flexibility ensures you can choose the most efficient tool for your specific workloads.

In this session, we will:

- Examine the evolution of data processing, contrasting distributed frameworks like Spark with single-node solutions.
- Explore how technologies like Polars and DuckDB operate, their strengths, and how they compare to Spark in performance and scalability.
- Evaluate use cases to determine which approach—distributed or single-node—fits best

Thibauld Croonenborghs

Data Architect at AE

Brugge, Belgium

Actions

View Speaker Profile

Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.

Session

Single-node technologies vs. Spark in Microsoft Fabric: Choosing the right tool for the job

Thibauld Croonenborghs

Links

Actions