
Vijay Shekhawat
Staff Software Engineer - Data at TRM Labs
Bristol, United Kingdom
Actions
Vijay is a seasoned data and software engineer currently serving as the engineering lead for TRM Labs Next-Generation Data Platform. In his career, he has previously worked at companies like LinkedIn and Expedia, and he brings deep expertise in Data Lakehouse architectures, real-time streaming, and building secure, high-throughput pipelines for petabyte-scale customer facing analytics.
Links
Area of Expertise
Topics
Engineering a Next-Gen Lakehouse: StarRocks + Iceberg in Production at TRM Labs
At TRM Labs, we process petabyte-scale blockchain data across 40+ networks to deliver real-time intelligence to financial institutions, governments, and enterprises. As our need for low-latency, high-throughput analytics grew, our legacy serverless data warehouse architecture began to show its limitations.
In this talk, we'll share how we reimagined our data platform—migrating to a modern lakehouse architecture built on Apache Iceberg and powered by StarRocks to support sub-second, user-facing analytics. We'll cover the core design principles behind the platform, the architectural evolution of our StarRocks deployment, and the operational insights we've gathered over a year of production usage.
---
**What You'll Learn**
**1. Replatforming to a Lakehouse Architecture**
- How we transitioned from BigQuery and Postgres to an open, modular lakehouse stack.
- Apache Iceberg's role as the foundational storage layer.
- Why we selected StarRocks as our low-latency analytical engine.
**2. Powering Real-Time Customer Analytics**
- How we use StarRocks to serve customer-facing dashboards and APIs.
- Handling concurrency and diverse query patterns.
- Integrating StarRocks with our upstream event-driven pipelines.
**3. Production Lessons from One Year in the Wild**
- Best Practices: Managing clusters, enabling cross-environment reads, and accelerating developer workflows.
- Performance & Cost Optimization:
- Dual-layered Kubernetes architecture for separating ingestion and serving workloads.
- Advanced data cache tuning for Iceberg-backed StarRocks queries.
- Dynamic cluster scaling and compute-storage decoupling.
- Future Focus: Scaling to support 100+ TPS APIs and hundreds of TBs with StarRocks + Iceberg.
- Tips & Tricks: Versioning, release automation, and extending StarRocks through internal development.
Architecting Real-Time Blockchain Intelligence with Apache Beam and Apache Kafka
At TRM Labs, we manage petabyte-scale data from over 30 blockchains to deliver customer-facing analytics. Our platform processes high-throughput data to extract actionable intelligence for critical decision-making.
In this session, we will discuss how Apache Beam underpins our architecture by integrating with Apache Kafka for robust data ingestion and deploying on Google Cloud Dataflow to ensure scalability and fault tolerance. We will also delve into the complexities of handling massive volumes of blockchain data—peaking at up to one million events per second—in real time and computing complex metrics.
Key Takeaways:
• Designing and scaling a real-time streaming data platform to meet the rigorous demands of petabyte-scale blockchain data.
• Employing Apache Kafka for reliable, high-throughput data ingestion, with practical insights from networks such as BSC, Ethereum, and Tron.
• Leveraging Apache Beam and Google Cloud Dataflow for scalable and flexible data processing and enrichment.
• Ensuring exactly-once semantics for transactional data.
• Optimizing high-throughput writes by fine-tuning the JDBC protocol at the TCP layer.
• Implementing best practices for performance, monitoring, maintenance, and security in a high-stakes, real-time streaming environment.
From Zero to One: Building a Petabyte-Scale Data Analytics Platform with Apache Iceberg™
Apache Iceberg™ is transforming modern data architecture by providing the efficiency and flexibility of a managed data warehouse without the vendor lock-in. At TRM Labs, our data platform has traditionally relied on BigQuery and distributed Postgres to serve queries over terabyte-sized datasets for external, customer-facing consumption with low latency and high concurrency; this solution proved to be both expensive and limiting. We made a bold move: adopting Iceberg at the core of a petabyte-scale data lakehouse to power external, user-facing analytics.
In this session, we will discuss why your organization should consider adopting Iceberg. We will cover how to benchmark it against other table formats to power a high-performance, low-latency analytics platform and key architectural decisions across data ingestion, data modeling, compute optimization, and data operations that can enable efficient scaling. Additionally, we will share performance-tuning techniques, including clustering and advanced data and metadata caching, that helped us improve query efficiency and reduce compute and storage costs. If you are looking for practical guidance on building a roadmap for adopting a lakehouse, we will also share suggestions and lessons learned.
Whether you’re considering Iceberg or scaling an existing implementation, this session will equip you with actionable insights to build a long-term, high-performance analytics strategy.

Vijay Shekhawat
Staff Software Engineer - Data at TRM Labs
Bristol, United Kingdom
Links
Actions
Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.
Jump to top