Speaker

Ryanne Dolan

Ryanne Dolan

data pipelines @LinkedIn

San Antonio, Texas, United States

Actions

Ryanne works on data pipelines and streaming infra at LinkedIn. Primary author of MirrorMaker2. Previously at Twitter, Cloudera, Hortonworks.

Area of Expertise

  • Media & Information

Topics

  • Kafka Connect
  • Kafka Streams
  • Apache Kafka
  • Streaming
  • Data Streams
  • stream processing
  • Streaming Data Analytics
  • Data Streaming
  • Data Replication
  • Data Pipelines
  • Data pipeline
  • Replications
  • replication
  • Event Streaming

Deeply Declarative Data Pipelines

With Flink and Kubernetes, it's possible to deploy stream processing jobs with just SQL and YAML. This low-code approach can certainly save a lot of development time. However, there is more to data pipelines than just streaming SQL. We must wire up many different systems, thread through schemas, and, worst-of-all, write a lot of configuration.

In this talk, we'll explore just how "declarative" we can make streaming data pipelines on Kubernetes. I'll show how we can go deeper by adding more and more operators to the stack. How deep can we go?

Fan-in Flames: Scaling Kafka to Millions of Producers

At supermassive scale, a perennial problem with Kafka is "high fan-in" -- a large number of producers sending records to a small number of brokers. Even a relatively modest amount of data can overwhelm a broker when there are hundreds of thousands of concurrent producer requests.

This talk discusses a few real-world applications where high fan-in becomes a problem, and presents a few strategies for dealing with it. These include: fronting Kafka with an ingestion layer; separating brokers into read-only and write-only subsets; implementing specialized partitioning strategies; and scaling across clusters with "smart clients".

Geo-replicated Kafka Streams Apps

Kafka Streams assumes that all your data lives in a single Kafka cluster. As you scale to multiple data centers, or into a hybrid/multi-cloud environment, you are likely to encounter a much more complicated reality.

This talk presents several strategies for dealing with geo-replicated Kafka topics in Kafka Streams applications. You'll see that it's easy to get started, but there are trade-offs to consider with each approach.

Connect at Twitter-scale

Twitter has one of the largest Kafka fleets in the world, handling hundreds of millions of events per second. In order to operate Kafka Connect at this scale, we've had to get creative. In this talk we'll present some of the problems we've run into with Kafka Connect, and how we've engineered around them.

Getting up to speed with MirrorMaker 2

More and more Enterprises are relying on Apache Kafka to run their businesses. Cluster administrators need the ability to mirror data between clusters to provide high availability and disaster recovery.

MirrorMaker 2, released recently as part of Kafka 2.4.0, allows you to mirror multiple clusters and create many replication topologies. Learn all about this awesome new tool and how to reliably and easily mirror clusters.

We will first describe how MirrorMaker 2 works, including how it addresses all the shortcomings of MirrorMaker 1. We will also cover how to decide between its many deployment modes. Finally, we will share our experience running it in production as well as our tips and tricks to get a smooth ride.

Real-Time Analytics Summit 2023 Sessionize Event

April 2023 San Francisco, California, United States

Kafka Summit London 2022 Sessionize Event

April 2022 London, United Kingdom

Kafka Summit Americas 2021 Sessionize Event

September 2021

Kafka Summit APAC 2021 Sessionize Event

July 2021

Kafka Summit Europe 2021 Sessionize Event

May 2021

Ryanne Dolan

data pipelines @LinkedIn

San Antonio, Texas, United States

Actions

Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.

Jump to top