Session
Backing Up Apache Iceberg Tables Across Environments Using Project Nessie
In this talk, we delve into Wells Fargo's approach for backing up and synchronizing Apache Iceberg tables across environments using Project Nessie as a catalog-level control plane. By combining object storage replication with Nessie’s Git-like metadata versioning, we demonstrate how production Iceberg tables can be continuously mirrored into non-production catalogs.
The architecture consists of two coordinated replication layers:
1. Storage-Layer Replication
All Iceberg table data such as Parquet files, manifests, and metadata JSON is replicated from production S3 into non-production S3 using standard enterprise tooling (rclone, distcp, or object-store replication).
2. Catalog-Layer Replication with Nessie
Production Nessie stores authoritative Iceberg metadata
Non-production Nessie runs as a separate instance
Nessie’s MongoDB collections (objs2, refs2) are periodically synchronized across environments
Once completed, the non-production Nessie catalog becomes a true mirror of production.
Vineel Arekapudi
Engineering Data Platforms from Storage to API, Senior Data Engineer Consultant at Wells Fargo
Chattanooga, Tennessee, United States
Links
Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.
Jump to top