Session
Native Iceberg Scans at Rust Speed: How DataFusion-Comet Achieves Faster Query Performance
Apache Spark processes petabytes of Iceberg data daily—but a hidden tax plagues every query. JVM overhead from garbage collection, memory pressure, and slow Arrow FFI crossings silently add 50-70% execution overhead on scan-heavy workloads. Your Iceberg tables are fast; your execution engine isn't.
DataFusion-Comet eliminates this tax entirely through a Rust-based native Iceberg scan that bypasses Spark's DataSource V2 API. Spark's Iceberg catalog handles query planning while iceberg-rust executes parallel file reads via Apache Arrow—no JVM involved. Our IcebergScanExec operator delivers dramatic results on TPC-H.
One configuration flag activates native execution on existing Iceberg tables with zero code changes. Attendees will learn the architecture bridging Spark planning with Rust execution, understand current limitations (Spec V3, ORC/Avro fallback), and see the roadmap toward complex types and Merge-on-Read. Most importantly: you can start accelerating your Iceberg queries today.
Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.
Jump to top