Session
Don't let your Apache Iceberg data slip through the cracks
Did you know that ingesting data to Apache Iceberg tables on S3-backed storage infrastructure using asynchronous replication can lead to unexpected data loss and may leave you with broken Iceberg tables? This is particularly important to consider if you're using a managed or hosted private cloud.
In this talk, we'll unravel this often-overlooked issue, dissect the problem's root causes, and explore potential mitigation strategies. I'll also share how our team at Bloomberg solved this challenge in our data pipeline by using external metadata stores and replay capabilities.
This session should leave you with practical insights and a fresh perspective on ensuring data integrity in Iceberg. Don't let your data vanish!

Priyansh Agrawal
Senior Software Engineer @Bloomberg
London, United Kingdom
Links
Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.
Jump to top