Session

Real-time Event Joining in Practice With Kafka and Flink

Historically, machine learning training pipelines have been heavily utilizing batch training models, i.e., getting retrained every few hours. However, industrial practitioners have proved that real-time training can yield a more adaptive and personalized user experience. The transition from batch to real-time is full of tradeoffs to get the benefits of accuracy and freshness while keeping the costs low and having a predictable, maintainable system.

This session will delve deeper into our journey of migrating to a streaming pipeline for our ML models using Kafka and Flink. You will learn how to transition from Pub/Sub to Kafka for incoming real-time events and leverage Flink for streaming joins using RocksDB and checkpointing. We will also discuss navigating nuances like causal dependency between events, event-time versus processing time, and exactly-once vs atleast-once delivery, among others.

Furthermore, you will see how we utilized topic partitioning in Kafka to improve scalability, reduced the throughput of events by 85% using Avro schema and compression, decreased cost by 40%, and set up a separate pipeline to ensure correctness.

Attending this session, you'll gain a deeper understanding of the tradeoffs and nuances in real-time systems, allowing you to make well-informed decisions that suit your specific requirements.

Srijan Saket

Staff Machine Learning Engineer at ShareChat, transforming India's content ecosystem by building largest social media platform for Bharat

Seattle, Washington, United States

Actions

Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.

Jump to top