Session

Working with Apache Iceberg Without Spark: The Rise of Lightweight Query Engines

Apache Spark has long been the go-to engine for working with data, but do you really need a full-blown distributed system for every workload? What if you could ingest, query, and maintain Iceberg tables with lightweight, faster, and more efficient alternatives?

In this talk, we’ll explore how different but powerful query engines like DuckDB, DataFusion, PyIceberg, and Trino are changing the game by offering simpler, cheaper, and often faster ways to interact with Iceberg tables. We’ll cover:
✅ Why Not Spark ?
✅ Ingesting and querying Iceberg tables without spinning up a massive Spark cluster.
✅ Benchmarking different query engines like DuckDB, DataFusion, Pyiceberg and Trino.
✅ PyIceberg for observability – how to track table metadata, schema evolution, partition layouts, and snapshot history without Spark
✅ The Future metadata operations with Rust - compaction, rewrite manifests , expire snapshots, Puffin files

If you thought Spark was the only way to work with Iceberg, think again. Join us to explore how small engines are making big waves in the data lakehouse world! 🚀

Amit Gilad

Lakesphere, CTO

Actions

Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.

Jump to top