Navigating Icebergs: Don't Be the Next Titanic!
The advantages that Apache Iceberg brings to the Data Lake ecosystem are undeniable. However, it's important to remember that behind the scenes, immutable Parquet files reside in your lake. Consequently, the usual culprits are ever-present, ready to cause trouble; data skew and data fragmentation are just a couple of examples.
With that in mind, wouldn't it be great to have a plug-and-play solution for monitoring the health of your tables, seamlessly integrating with your open telemetry compliant observability platform?
In this presentation, we'll introduce a straightforward yet invaluable (and free) solution to keep your iceberg (tables) under your radar!
Syncing the Iceberg: Real-Time sailing at Terabyte Latitudes
Iceberg, along with other table formats, promises ACID properties atop read-optimized and open file formats like Apache Parquet. But is achieving this promise feasible when synchronising tables in near real-time? Will optimistic concurrency remain the optimal choice? What trade-offs will we encounter? Let's embark on a journey across glacial seas and find out!
DataOps in action with Nessie, Iceberg and Great Expectations
In this talk I will present how we used Nessie, Iceberg and Great Expectations to build a Data Ops pipeline that ensures Data Quality and avoids “Datastrophes”
Subsurface LIVE 2024 Sessionize Event
Subsurface LIVE 2023 Sessionize Event
Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.
Jump to top