Session

You Migrated to Apache Iceberg—Now What? Speed, Efficiency, and Cost in the Real World

You’ve moved to Apache Iceberg—but is your data lake really delivering on its promises? Are your queries running faster, or are you still hitting bottlenecks? Is your data truly optimized, or are inefficient layouts and large file scans inflating costs?
This session dives into real-world Iceberg performance tuning, covering:

✅ Choosing the Right Catalog for Your Needs – Understanding the trade-offs between catalogs and how your choice impacts query performance, governance, and multi-engine compatibility.
✅ The impact of metadata management (compaction, expire_snapshot, remove_orphan_files, rewrite_data_files) on cost and query performance, and when and how to run them efficiently.
✅ How Puffin files improve query efficiency with advanced statistics.
✅ Best practices for optimizing workloads across query engines: Trino, Spark, Datafusion and DuckDB.
✅ Cost-saving strategies to reduce compute overhead and unnecessary scans through smarter metadata pruning, partitioning, and file management.

Migrating to Iceberg was just the beginning—now it’s time to master metadata management, optimize query performance, and make every compute dollar count. 🚀

Amit Gilad

Lakesphere, CTO

Actions

Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.

Jump to top