Vineel Arekapudi
Engineering Data Platforms from Storage to API, Senior Data Engineer Consultant at Wells Fargo
Chattanooga, Tennessee, United States
Actions
I work at the intersection of large-scale data engineering, cloud platforms, and applied AI. I currently build and lead modern data platforms at a major U.S. bank, where I design Lakehouse and streaming architectures that operate at multi-billion-record scale across cloud environments.
My background spans the full evolution of enterprise data systems, from mainframe and Teradata warehouses to cloud-native lakehouses built on Spark, Iceberg, Kafka, and Kubernetes. Over the past decade, my work has focused on building production-grade data platforms: high-throughput ingestion pipelines, real-time analytics systems, and ML-ready data infrastructure used by data scientists, analysts, and AI teams.
In addition to data engineering, I have deep experience in full-stack platform development using Java, Spring Boot, REST APIs, and modern front-end frameworks. This allows me to design data systems not just as pipelines, but as products — complete with APIs, services, governance layers, and developer tooling.
My current interests include open table formats (Apache Iceberg), lakehouse architecture, metadata-driven governance, and building scalable AI-ready data platforms. I enjoy sharing practical lessons from real production systems — what works, what breaks, and how to design data infrastructure that lasts.
Links
Area of Expertise
Topics
Backing Up Apache Iceberg Tables Across Environments Using Project Nessie
In this talk, we delve into Wells Fargo's approach for backing up and synchronizing Apache Iceberg tables across environments using Project Nessie as a catalog-level control plane. By combining object storage replication with Nessie’s Git-like metadata versioning, we demonstrate how production Iceberg tables can be continuously mirrored into non-production catalogs.
The architecture consists of two coordinated replication layers:
1. Storage-Layer Replication
All Iceberg table data such as Parquet files, manifests, and metadata JSON is replicated from production S3 into non-production S3 using standard enterprise tooling (rclone, distcp, or object-store replication).
2. Catalog-Layer Replication with Nessie
Production Nessie stores authoritative Iceberg metadata
Non-production Nessie runs as a separate instance
Nessie’s MongoDB collections (objs2, refs2) are periodically synchronized across environments
Once completed, the non-production Nessie catalog becomes a true mirror of production.
Back to the future: Time Travel in Microsoft Fabric for Iceberg based tables
This lightning talk will do a quick dive into the metadata layer of Iceberg to cover these topics:
- Overview of Fabric/Iceberg internal metadata tables (essentials for time travel)
- Time Travel queries like
Select * from db.table.history;
Select * from db.table.snapshots;
Select * from db.table.files;
Select * from db.table.manifests;
Select * from db.table.partitions;
- Advanced Topics:
a. Rollback
b. Maintenance - e.g., compaction (rewrite_data_files), remove orphan files, expire snapshots
- CoW (Copy on Write) vs MoR (Merge on Read)
a. Default - V2 Copy on Write
b. V2 Merge on Read
c. V3 Merge on Read (the best but query engines like Dremio does not seem to support this yet)
- Nessie branching: branching at catalog
At the end of the talk, participants will leave with a better understanding of Fabric/Iceberg time travel and maintenance features.
Weaving RAGs into Fabric: A Governed Lakehouse Architecture for Enterprise AI Agents
In this session we look at how Wells Fargo implements “RAG-as-a-Service” multi-agent architecture built on Microsoft Fabric, where logs are centralized in OneLake and Databricks serves as the execution layer for orchestrating sequential agents. The architecture follows a two-stage pattern inspired by real-world incident triage: a Log Retrieval Agent that queries and assembles relevant context from Lakehouse tables using hybrid retrieval, followed by a Root Cause Processing Agent that consumes this context to generate structured summaries and recommended next steps, with all intermediate outputs persisted back into Fabric for governance, lineage, and observability.
Key Highlights:
Fabric OneLake as the governed context backbone for enterprise logs and metadata
Reusable RAG-as-a-Service layer exposing context retrieval and management APIs
Multi-agent orchestration: Log Retrieval Agent followed by Root Cause Processing Agent
Hybrid retrieval combining Lakehouse SQL filtering with semantic similarity search
Full lineage and auditability by persisting agent inputs and outputs back into Fabric tables
This session emphasizes a clear, enterprise-ready architecture rather than isolated AI demos.
Attendees will gain a concrete blueprint for implementing governed, multi-agent RAG workflows directly on Microsoft Fabric.
Iceberg Summit Upcoming
Vineel Arekapudi
Engineering Data Platforms from Storage to API, Senior Data Engineer Consultant at Wells Fargo
Chattanooga, Tennessee, United States
Links
Actions
Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.
Jump to top