Vineel Arekapudi

Engineering Data Platforms from Storage to API, Senior Data Engineer Consultant at Wells Fargo

Chattanooga, Tennessee, United States

Actions

I work at the intersection of large-scale data engineering, cloud platforms, and applied AI. I currently build and lead modern data platforms at a major U.S. bank, where I design Lakehouse and streaming architectures that operate at multi-billion-record scale across cloud environments.
My background spans the full evolution of enterprise data systems, from mainframe and Teradata warehouses to cloud-native lakehouses built on Spark, Iceberg, Kafka, and Kubernetes. Over the past decade, my work has focused on building production-grade data platforms: high-throughput ingestion pipelines, real-time analytics systems, and ML-ready data infrastructure used by data scientists, analysts, and AI teams.
In addition to data engineering, I have deep experience in full-stack platform development using Java, Spring Boot, REST APIs, and modern front-end frameworks. This allows me to design data systems not just as pipelines, but as products — complete with APIs, services, governance layers, and developer tooling.
My current interests include open table formats (Apache Iceberg), lakehouse architecture, metadata-driven governance, and building scalable AI-ready data platforms. I enjoy sharing practical lessons from real production systems — what works, what breaks, and how to design data infrastructure that lasts.

Area of Expertise

Finance & Banking
Health & Medical
Information & Communications Technology

Topics

Data Engineering
Data Science & AI
Big Data
Data Analytics
Data Science
Databases
Azure Data & AI
Azure Data Platform
Data Visualization
Analytics and Big Data
Data Platform
Azure SQL Database
Data Management
Database
Data Warehousing
Azure Data Factory
All things data
Microsoft Data Platform
Data Security
Big Data Machine Learning AI and Analytics
Apache Iceberg
Project Nessie
Spark Streaming
Apache Spark
PySpark
PyIceberg
Kubernetes
RAGS
Agentic AI
ETL process in Spark
Azure Kubernetes Service (AKS)
Delphix
Data Modeling
Model Context Protocol (MCP)
Large Language Models (LLMs)
Apache Airflow
Service Mesh (Istio/Linkerd)
Mastering Data Pipelines: Leveraging Apache Airflow and Integrating with Key Tools for Advanced Data Engineering and Analytics
Apache Airflow orchestration
Neo4j
mongodb
Delta Lake
Apache Hudi
Apache Polaris
MLOps and Python programming
MLOps
Java
Core Java / Java SE
Java with Generative AI & LLMs

Backing Up Apache Iceberg Tables Across Environments Using Project Nessie

In this talk, we delve into Wells Fargo's approach for backing up and synchronizing Apache Iceberg tables across environments using Project Nessie as a catalog-level control plane. By combining object storage replication with Nessie’s Git-like metadata versioning, we demonstrate how production Iceberg tables can be continuously mirrored into non-production catalogs.

The architecture consists of two coordinated replication layers:
1. Storage-Layer Replication
All Iceberg table data such as Parquet files, manifests, and metadata JSON is replicated from production S3 into non-production S3 using standard enterprise tooling (rclone, distcp, or object-store replication).
2. Catalog-Layer Replication with Nessie
Production Nessie stores authoritative Iceberg metadata
Non-production Nessie runs as a separate instance
Nessie’s MongoDB collections (objs2, refs2) are periodically synchronized across environments

Once completed, the non-production Nessie catalog becomes a true mirror of production.

Back to the future: Time Travel in Microsoft Fabric for Iceberg based tables

This lightning talk will do a quick dive into the metadata layer of Iceberg to cover these topics:
- Overview of Fabric/Iceberg internal metadata tables (essentials for time travel)
- Time Travel queries like
Select * from db.table.history;
Select * from db.table.snapshots;
Select * from db.table.files;
Select * from db.table.manifests;
Select * from db.table.partitions;
- Advanced Topics:
a. Rollback
b. Maintenance - e.g., compaction (rewrite_data_files), remove orphan files, expire snapshots
- CoW (Copy on Write) vs MoR (Merge on Read)
a. Default - V2 Copy on Write
b. V2 Merge on Read
c. V3 Merge on Read (the best but query engines like Dremio does not seem to support this yet)
- Nessie branching: branching at catalog

At the end of the talk, participants will leave with a better understanding of Fabric/Iceberg time travel and maintenance features.

Weaving RAGs into Fabric: A Governed Lakehouse Architecture for Enterprise AI Agents

In this session we look at how Wells Fargo implements “RAG-as-a-Service” multi-agent architecture built on Microsoft Fabric, where logs are centralized in OneLake and Databricks serves as the execution layer for orchestrating sequential agents. The architecture follows a two-stage pattern inspired by real-world incident triage: a Log Retrieval Agent that queries and assembles relevant context from Lakehouse tables using hybrid retrieval, followed by a Root Cause Processing Agent that consumes this context to generate structured summaries and recommended next steps, with all intermediate outputs persisted back into Fabric for governance, lineage, and observability.

Key Highlights:

Fabric OneLake as the governed context backbone for enterprise logs and metadata

Reusable RAG-as-a-Service layer exposing context retrieval and management APIs

Multi-agent orchestration: Log Retrieval Agent followed by Root Cause Processing Agent

Hybrid retrieval combining Lakehouse SQL filtering with semantic similarity search

Full lineage and auditability by persisting agent inputs and outputs back into Fabric tables

This session emphasizes a clear, enterprise-ready architecture rather than isolated AI demos.
Attendees will gain a concrete blueprint for implementing governed, multi-agent RAG workflows directly on Microsoft Fabric.

AI-Lytics Saturday Sessionize Event

April 2026

Iceberg Summit

April 2026 San Francisco, California, United States

SQL Saturday Atlanta 2026 - AI & BI Sessionize Event

March 2026 Alpharetta, Georgia, United States

Vineel Arekapudi

Engineering Data Platforms from Storage to API, Senior Data Engineer Consultant at Wells Fargo

Chattanooga, Tennessee, United States

Actions

Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.

Speaker

Vineel Arekapudi

Actions

Links

Area of Expertise

Topics

Sessions

Backing Up Apache Iceberg Tables Across Environments Using Project Nessie

Back to the future: Time Travel in Microsoft Fabric for Iceberg based tables

Weaving RAGs into Fabric: A Governed Lakehouse Architecture for Enterprise AI Agents

Events

AI-Lytics Saturday Sessionize Event

Iceberg Summit

SQL Saturday Atlanta 2026 - AI & BI Sessionize Event

Vineel Arekapudi

Links

Actions