
Amit Gilad
Lakesphere, CTO
Actions
Seasoned data engineer with over eight years of experience architecting and managing large-scale data systems. Currently working as cto at lakesphere's optimization platform for the lake. In the past Amit has played an instrumental role in spearheading Cloudinary's transition to the cutting-edge Apache Iceberg distributed data table format, leveraging his deep expertise in optimizing data storage, enhancing data retrieval processes, and ensuring seamless data operations within cloud environments.
The Great Query Migration: Transitioning from single to multi query engine Architecture
In this talk, i will discuss how In today's data-driven world, the ability to leverage different query engines for diverse analytical needs is crucial for maximizing operational efficiency. This talk will explore transition to Apache Iceberg, highlighting how this technology has enabled us to use multiple query engines such as Apache Spark,Athena,Trino,Snowflake and more for various use cases and the challenges that came when having more than one engine
Best Practices and Insights when migrating to Apache Iceberg for Data Engineers
Amit Gilad from Cloudinary will share how they expanded their data lake to use Apache Iceberg. The talk will demonstrate how moving from Snowflake to an open table format allowed them to reduce storage costs and leverage different query and processing engines to run more powerful analytics at scale.
You Migrated to the Lake—But What Now? Exploring the Future and What's Still Missing
So, you’ve made the leap—you migrated your data to the data lake. You’ve embraced open table formats, ditched the old-school data warehouse, and finally feel like you’re in control of your data. But now comes the hard part: what’s next?
In this talk, we’ll take a critical look at the post-migration reality. Sure, you’ve got Iceberg, Delta, or Hudi running in production, but:
• Are your queries actually faster and cheaper? Or did you just shift complexity from one place to another?
• Are you truly vendor-agnostic? Or are you still locked in by query engines and metadata layers?
• Have you solved table maintenance, compaction, and governance? Or are you just ignoring those problems for now?
• What’s still missing? What innovations are needed to make data lakes truly operational—not just storage dumps?
We’ll explore the next wave of lake evolution, from adaptive clustering and indexing to real-time processing and transactional consistency. Whether you’re running on Snowflake, Databricks, Trino, or DuckDB, this session will challenge assumptions and spark fresh ideas about what’s still broken—and where we go from here.
If you thought migrating to the lake was the hard part, think again. The real work starts now. 🚀
You Migrated to Apache Iceberg—Now What? Speed, Efficiency, and Cost in the Real World
You’ve moved to Apache Iceberg—but is your data lake really delivering on its promises? Are your queries running faster, or are you still hitting bottlenecks? Is your data truly optimized, or are inefficient layouts and large file scans inflating costs?
This session dives into real-world Iceberg performance tuning, covering:
✅ Choosing the Right Catalog for Your Needs – Understanding the trade-offs between catalogs and how your choice impacts query performance, governance, and multi-engine compatibility.
✅ The impact of metadata management (compaction, expire_snapshot, remove_orphan_files, rewrite_data_files) on cost and query performance, and when and how to run them efficiently.
✅ How Puffin files improve query efficiency with advanced statistics.
✅ Best practices for optimizing workloads across query engines: Trino, Spark, Datafusion and DuckDB.
✅ Cost-saving strategies to reduce compute overhead and unnecessary scans through smarter metadata pruning, partitioning, and file management.
Migrating to Iceberg was just the beginning—now it’s time to master metadata management, optimize query performance, and make every compute dollar count. 🚀
Working with Apache Iceberg Without Spark: The Rise of Lightweight Query Engines
Apache Spark has long been the go-to engine for working with data, but do you really need a full-blown distributed system for every workload? What if you could ingest, query, and maintain Iceberg tables with lightweight, faster, and more efficient alternatives?
In this talk, we’ll explore how different but powerful query engines like DuckDB, DataFusion, PyIceberg, and Trino are changing the game by offering simpler, cheaper, and often faster ways to interact with Iceberg tables. We’ll cover:
✅ Why Not Spark ?
✅ Ingesting and querying Iceberg tables without spinning up a massive Spark cluster.
✅ Benchmarking different query engines like DuckDB, DataFusion, Pyiceberg and Trino.
✅ PyIceberg for observability – how to track table metadata, schema evolution, partition layouts, and snapshot history without Spark
✅ The Future metadata operations with Rust - compaction, rewrite manifests , expire snapshots, Puffin files
If you thought Spark was the only way to work with Iceberg, think again. Join us to explore how small engines are making big waves in the data lakehouse world! 🚀
Turbocharging Your Data Lake: Real-World Apache Iceberg Performance Tuning
Want to make your Apache Iceberg tables blazingly fast ? Join me for an in-depth session packed with practical strategies to fine-tune your data lake for top-tier performance and scalability. We’ll walk through critical table maintenance procedures—such as metadata optimization, handling the small-file problem through smart compaction, and visualizing bottle necks in our table —alongside battle-tested best practices for both streaming and batch processing workloads.
We’ll also dive into a key question many teams overlook: Does the catalog layer matter? Spoiler: it does. The choice between catalogs can have real implications on write/read performance, and multi-engine compatibility.
Plus, discover how Puffin files are reshaping how metadata is stored and queried, unlocking new ways to accelerate your analytical workloads. Expect actionable insights, compelling benchmarks, and real-world takeaways to help you lower latency, reduce I/O, and keep compute costs in check.
Designing a Multi-Engine Lakehouse with Apache Iceberg: One Table, Many Engines
What if one Iceberg table could power your entire analytics stack—from interactive queries in DuckDB to batch processing in Spark, ad-hoc exploration in Trino, and real-time dashboards in StarRocks? This talk explores how Apache Iceberg enables true multi-engine interoperability through its spec-compliant design, catalog decoupling, and powerful metadata model. We’ll walk through architectural patterns for building a shared data layer across diverse compute engines without sacrificing consistency, performance, or flexibility. Whether you’re modernizing a legacy data warehouse or building a new lakehouse from scratch, this talk shows you how to get the most from Iceberg—once, write anywhere; query everywhere.
Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.
Jump to top