Backing Up Apache Iceberg Tables Across Environments Using Project Nessie

In this talk, we delve into Wells Fargo's approach for backing up and synchronizing Apache Iceberg tables across environments using Project Nessie as a catalog-level control plane. By combining object storage replication with Nessie’s Git-like metadata versioning, we demonstrate how production Iceberg tables can be continuously mirrored into non-production catalogs.

The architecture consists of two coordinated replication layers:
1. Storage-Layer Replication
All Iceberg table data such as Parquet files, manifests, and metadata JSON is replicated from production S3 into non-production S3 using standard enterprise tooling (rclone, distcp, or object-store replication).
2. Catalog-Layer Replication with Nessie
Production Nessie stores authoritative Iceberg metadata
Non-production Nessie runs as a separate instance
Nessie’s MongoDB collections (objs2, refs2) are periodically synchronized across environments

Once completed, the non-production Nessie catalog becomes a true mirror of production.

Vineel Arekapudi

Engineering Data Platforms from Storage to API, Senior Data Engineer Consultant at Wells Fargo

Chattanooga, Tennessee, United States

Actions

View Speaker Profile

Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.

Session

Backing Up Apache Iceberg Tables Across Environments Using Project Nessie

Vineel Arekapudi

Links

Actions