Speaker

Satish Duggana

Satish Duggana

Leads Data/Streaming Infra

Actions

Satish Duggana leads Data/Streaming Infrastructure at Uber, Bangalore. He is an Apache Kafka Committer/PMC, Apache Storm Committer/PMC and contributed to a few other open source projects.

Better reliability and availability of Kafka with enhanced replication protocol and quota controls.

Apache Kafka introduced tiered storage(KIP-405) feature that provides isolation of storage and processing in a broker. It allows the segments to be available based on configurable remote retention based on time and size. In this talk, we plan to dive into a couple of features that are added to tiered storage functionality that improve the reliability and availability of brokers in the cluster. These are
1. Quota controls for copy/read from remote storage
2. Tiered offset strategy to improve the availability of new brokers in case of broker failures. (Enhancement to the replication protocol for tiered storage)

This talk will help attendees understand tiered storage feature and the extended features like quota controls, tiered offset strategy so that they can run their clusters with better reliability, availability, and efficiency.

What are we waiting for in the Kafka Tiered Storage Early Access Release?

With Kafka 3.6.0, Apache Kafka brings a long anticipated feature ”tiered storage” with early access release. But the users are interested in knowing what we are waiting for in this early access release, and what we will have for tiered storage in the following releases. In this talk we will cover:

- Introduction of tiered storage in Apache Kafka
- Current set of features and limitations (3.6.0)
- What were added as part of 3.7.0
- What is planned to come in the next release (3.8.0)
- Enable/Disable tiered storage gracefully on a topic
- Quota controls on remote storage operations for better reliability
- and more improvements
- Future plans (after 3.8.0)

This feature has been already running in production in companies like Uber, AWS. The community has been working on contributing the changes and making enhancements to drive towards production ready.

This talk will give the audience a good understanding about the feature, its current scope and what the community has been working on and what to expect for Kafka tiered storage.

Learnings of running Kafka tiered storage at scale

KIP-405 introduced tiered storage in Apache Kafka. It introduces the separation of compute and storage in brokers that improves the scalability, efficiency, and elasticity of Kafka clusters. We implemented this feature and have been running it in production for several months in different tiers of clusters at Uber.

We will talk about the following:
- The principles followed in building the feature.
- The journey of deploying and running it in our production clusters with different workloads.
- The learnings from running it in production at a large scale, that led to a few interesting features extended from KIP-405.
- The issues encountered and how we have fixed and mitigated them.

Kafka Tiered Storage

Kafka is a vital part of data infrastructure in many organizations. When the Kafka cluster grows and more data is stored in Kafka for a longer duration, several issues related to scalability, efficiency, and operations become important to address. Kafka cluster storage is typically scaled by adding more broker nodes to the cluster. But this also adds needless memory and CPUs to the cluster making overall storage cost less efficient compared to storing the older data in external storage.

Tiered storage is introduced to extend Kafka's storage beyond the local storage available on the Kafka cluster by retaining the older data in cheaper stores, such as HDFS, S3, Azure or GCS with minimal impact on the internals of Kafka.

We will talk about
- How tiered storage addresses the above problems and also brings several other advantages.
- High level architecture of tiered storage
- Future work planned as part of tiered storage.

Deep dive into Kafka Tiered Storage

KIP-405 introduced tiered storage in Apache Kafka. The proposed design introduces the separation of compute and storage which benefits the brokers to largely focus on serving producer or consume requests and not manage the storage beyond local disks. But the important caveat here is that it should still maintain the same consistency semantics and lineage of data as in the local storage.
This talk dives into the internals of tiered storage in how we achieve those semantics covering scenarios like new brokers bootstrapped, or brokers having hard failures, or other out-of-sync brokers becoming leaders etc.

We will also talk about how topic deletion lifecycle management is done without leaking any segments in tiered storage based on the retention policies or while deleting a topic or a partition.

Deep dive into Kafka Tiered Storage

KIP-405 introduced tiered storage in Apache Kafka. The proposed design introduces the separation of compute and storage which benefits the brokers to largely focus on serving producer or consume requests and not manage the storage beyond local disks. But the important caveat here is it should still maintain the same consistency semantics and the same lineage of data as it is done in the local storage.
This talk dives into internals of tiered storage in how we achieve those semantics covering scenarios like new brokers bootstrapped, or brokers having hard failures, or other out of sync brokers becoming leaders etc.

We will also talk about how a topic deletion lifecycle management is done without leaking any segments in tiered storage based on the retention policies or while deleting a topic or a partition.

Satish Duggana

Leads Data/Streaming Infra

Actions

Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.

Jump to top