Session

Building a Modern Streaming Data Pipeline with Apache Flink, Iceberg and Paimon

Streaming data architectures have evolved beyond traditional batch ETL pipelines. With the rise of streaming data lakes, enterprises can now build real-time, scalable, and cost-efficient data processing systems that seamlessly join event streams from Kafka with transactional data (MySQL), aggregate results, and store them in modern table formats like Apache Iceberg and Apache Paimon.

This talk will walk through how to architect a robust, end-to-end streaming data pipeline, covering:
• Consuming Real-Time Data from Kafka: Best practices for handling high-throughput streaming data.
• Joining with MySQL: Using Flink SQL to enrich streaming events with transactional data.
• Aggregating and Transforming Data: Efficient stateful processing techniques to handle large-scale real-time analytics.
• Apache Iceberg vs. Apache Paimon: Key features, trade-offs, and when to use each for a scalable, queryable streaming data lake.
• Real-World Use Cases: How companies are adopting streaming data lake architectures to improve reporting, machine learning, and real-time operational analytics.
• Comparing with Traditional Architectures: Why moving away from batch ETL + traditional data warehouses to a streaming-first approach improves latency, cost efficiency, and data freshness.

By the end of this session, attendees will understand how to build a scalable, real-time streaming pipeline that integrates Kafka, MySQL, Flink, and Apache Iceberg/Paimon to power low-latency, high-throughput analytics. This talk is perfect for data engineers, architects, and platform teams looking to modernize their data stack with real-time data lake architectures.

Abdul Rehman Zafar

Senior Solutions Architect at Ververica

Berlin, Germany

Actions

Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.