Dinesh Israni is a Principal Software Engineer at Portworx with over 12 years of experience building Distributed Storage solutions. Currently, he is working on developing cloud-native storage solutions and is the lead for the open-source STORK project. Through the open-source project, he is trying to help customers deploy and protect their cloud-native applications at scale.
Kafka consumer state handling improvements have come a long way with the introduction of Incremental Rebalancing (KIP-429) and Static Membership (KIP-345) protocols along with standby replicas. Yet as containerized workloads, ordinary Kafka Streams based data pipelines can experience downtime during continuous delivery. Kafka Streams application life-cycle events such as rolling and blue-green upgrades, canary deployments, scale up and down pose a particular challenge due state shuffling between brokers and applications.
In this talk we will present two solutions to overcome this pipeline availability challenge for large stateful Kafka Streams applications. The first approach showcases a solution using Kubernetes Container Storage Interface (CSI), Kubernetes scheduler extender Stork and Kafka consumer static membership protocol. Combined they can help compatible software defined storage solutions maintain the relationship between the underlying Kafka Streams applications and the persistent volumes holding their existing RocksDB data. This approach effectively implements a snapshotting mechanism that is transparent to the application layer. The second method externalizes the Kafka Streams StateStore using the pluggable interface. This approach eliminates tying the lifecycle of the application to the underlying persistent store. We will evaluate OSS technology choices such as Redis and Apache Geode to implement this design pattern. Finally we will discuss the tradeoffs between system complexity and stream processing performance comparing both approaches to the baseline implementation.
The take away for the audience is learning how the internal Kafka Streams state store can be best managed using cloud native technologies and practices.