Speaker

Vijay Shekhawat

Vijay Shekhawat

Staff Software Engineer - Data at TRM Labs

Bengaluru, India

Actions

Vijay is a seasoned data and software engineer currently serving as the engineering lead for TRM Labs Next-Generation Data Platform. In his career, he has previously worked at companies like LinkedIn and Expedia, and he brings deep expertise in Data Lakehouse architectures, real-time streaming, and building secure, high-throughput pipelines for petabyte-scale customer facing analytics.

Area of Expertise

  • Information & Communications Technology

Topics

  • Data Lakehouse
  • Apache Iceberg
  • Data Engineering
  • Big Data
  • Data Platform
  • Database

Architecting Real-Time Blockchain Intelligence with Apache Beam and Apache Kafka

At TRM Labs, we manage petabyte-scale data from over 30 blockchains to deliver customer-facing analytics. Our platform processes high-throughput data to extract actionable intelligence for critical decision-making.

In this session, we will discuss how Apache Beam underpins our architecture by integrating with Apache Kafka for robust data ingestion and deploying on Google Cloud Dataflow to ensure scalability and fault tolerance. We will also delve into the complexities of handling massive volumes of blockchain data—peaking at up to one million events per second—in real time and computing complex metrics.

Key Takeaways:
• Designing and scaling a real-time streaming data platform to meet the rigorous demands of petabyte-scale blockchain data.
• Employing Apache Kafka for reliable, high-throughput data ingestion, with practical insights from networks such as BSC, Ethereum, and Tron.
• Leveraging Apache Beam and Google Cloud Dataflow for scalable and flexible data processing and enrichment.
• Ensuring exactly-once semantics for transactional data.
• Optimizing high-throughput writes by fine-tuning the JDBC protocol at the TCP layer.
• Implementing best practices for performance, monitoring, maintenance, and security in a high-stakes, real-time streaming environment.

From Zero to One: Building a Petabyte-Scale Data Analytics Platform with Apache Iceberg™

Apache Iceberg™ is transforming modern data architecture by providing the efficiency and flexibility of a managed data warehouse without the vendor lock-in. At TRM Labs, our data platform has traditionally relied on BigQuery and distributed Postgres to serve queries over terabyte-sized datasets for external, customer-facing consumption with low latency and high concurrency; this solution proved to be both expensive and limiting. We made a bold move: adopting Iceberg at the core of a petabyte-scale data lakehouse to power external, user-facing analytics.

In this session, we will discuss why your organization should consider adopting Iceberg. We will cover how to benchmark it against other table formats to power a high-performance, low-latency analytics platform and key architectural decisions across data ingestion, data modeling, compute optimization, and data operations that can enable efficient scaling. Additionally, we will share performance-tuning techniques, including clustering and advanced data and metadata caching, that helped us improve query efficiency and reduce compute and storage costs. If you are looking for practical guidance on building a roadmap for adopting a lakehouse, we will also share suggestions and lessons learned.

Whether you’re considering Iceberg or scaling an existing implementation, this session will equip you with actionable insights to build a long-term, high-performance analytics strategy.

Vijay Shekhawat

Staff Software Engineer - Data at TRM Labs

Bengaluru, India

Actions

Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.

Jump to top