Session

Building Analytics Pipelines in Python with DuckDB and Parquet

Many Python applications start with operational databases and CSV exports for reporting and analytics. While these approaches work initially, they quickly become slow, inefficient, and difficult to scale as data volumes grow.

In this talk, we'll explore a modern analytics workflow built around Python, Parquet, and DuckDB. We'll examine why traditional CSV-based pipelines become bottlenecks, how columnar storage formats dramatically improve performance, and how DuckDB enables fast analytical queries directly on files without requiring a dedicated data warehouse.

Through practical examples, we'll walk through the process of extracting data, transforming it into Parquet datasets, and running analytical workloads using familiar SQL from within Python applications. We'll also compare performance characteristics across different approaches and discuss when these tools are the right choice for production systems.

Topics covered include:
• Why CSV-based analytics pipelines struggle at scale
• Understanding Parquet and columnar storage fundamentals
• Querying Parquet files directly with DuckDB
• Building efficient analytics workflows in Python
• Leveraging predicate pushdown and column pruning
• Performance comparisons and benchmarking
• Choosing the right architecture for analytical workloads

Attendees will leave with a practical understanding of modern analytics tooling and learn how to build lightweight, high-performance analytics pipelines using Python without introducing complex infrastructure.

Key Takeaways
• Understand the limitations of traditional CSV workflows
• Learn how Parquet improves storage and query performance
• Use DuckDB for fast analytical processing directly from Python
• Build scalable analytics pipelines with minimal operational overhead
• Apply modern data engineering techniques to existing Python applications

Muhammed Mizaj

Product Engineer at UST Global

Thiruvananthapuram, India

Actions

Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.

Jump to top