Speaker

Sharon Xie

Sharon Xie

Founding Engineer and Head of Product

Actions

Sharon is the Head of Product at Decodable, where she shapes the product vision and drives the development of the company’s real-time data platform. She first joined Decodable as a founding engineer, bringing more than a decade of full-stack engineering experience spanning web applications, data platforms, and underlying infrastructure. Before joining Decodable, she served as the technical lead for the real-time data platform at Splunk, focusing on the streaming query language and developer SDKs. Over the past seven years, she has specialized in designing and operating real-time data systems, developing extensive expertise in Apache Flink, Apache Kafka, and Debezium.

Unified CDC Ingestion and Processing with Apache Flink and Iceberg

Apache Iceberg is a robust foundation for large-scale data lakehouses, yet its incremental processing model lacks native support for CDC, making updates and deletes challenging. While many teams turn to Kafka and Flink for CDC processing, this comes with high infrastructure costs and operational complexity.
We needed a cost-effective solution with minute-level latency that supports dozens of terabytes of CDC data processing per day. Since we were already using Flink for Iceberg ingestion, we set out to extend it for CDC processing as well.

In this session, we’ll share how we tackled this challenge by writing change data streams as append tables and reading append tables as change streams. This approach makes Iceberg tables function like Kafka topics, with two added benefits:
Iceberg tables remain directly queryable, making troubleshooting and application integration more approachable and streamlined.
Similar to Kafka consumers, multiple engines can independently process Iceberg tables. However, unlike Kafka clusters, there is no need to scale infrastructure.

We will also explore optimization opportunities with Iceberg and Flink, including when to materialize tables and how to choose between append and upsert modes to enhance integration. If you’re working on data processing over Iceberg, this session will provide practical, battle-tested strategies to overcome limitations and scale efficiently while keeping the infrastructure simple.

Comparing Apache Flink and Spark for Modern Stream Data Processing

Real-time data processing is essential for staying competitive in today’s fast-paced business environment, and choosing the right tool is a key decision. Apache Flink and Spark Structured Streaming are two leading stream processing frameworks, each with unique strengths and trade-offs.

This talk takes a look at our journey at Decodable, where we evaluated both tools and ultimately chose Apache Flink over Spark Structured Streaming for our stream data processing needs. By examining key differences between the two systems, we aim to provide a clear, technical comparison that will help you make informed decisions for your streaming data use cases.

Join us for this talk where we will discuss:
1. Design philosophies: Learn about the origins of both systems and some of the fundamental architecture design choices of Flink that makes it more attractive for streaming use cases.
2. (Stateful) streaming capabilities: We will dive into and compare similar features that both Spark and Flink offer in the various APIs, we will also share some features only available in Flink that make it a much richer streaming library. We will also talk about some of the data ecosystem tools/connectors that Flink supports natively, like Debezium.
3. Production readiness: We will also talk about some of the recent features of Flink that makes running Flink at scale easy, like the Kubernetes operator and its sophisticated auto-scaler.

Timing is Everything: Understanding Event-Time Processing in Flink SQL

In the stream processing context, event-time processing means the events are processed based on when the events occurred, rather than when the events are observed (processing-time) in the system. Apache Flink has a powerful framework for event-time processing, which plays a pivotal role in ensuring temporal order and result accuracy.

In this talk, we will introduce Flink event-time semantics and demonstrate how watermarks as a means of handling late-arriving events are generated, propagated, and triggered using Flink SQL. We will explore operators such as window and join that are often used with event time processing, and how different configurations can impact the processing speed, cost and correctness.

Join us for this exploration where event-time theory meets practical SQL implementation, providing you with the tools to make informed decisions for making optimal trade-offs.

Kubernetes-like Reconciliation Protocol for Managed Flink Services

Want your Flink jobs to keep running without failures? Inspired by the robustness of Kubernetes, we created a managed Flink service that brings a similar experience. Users specify the desired Flink job states, and our platform ensures Flink jobs remain in that state. We embraced Kubernetes style reconciliation loops - constant monitoring, comparison of actual and desired states, and proactive actions to resolve any issues.

We've diverged from the conventional Kubernetes operator approach. Our implementation enables a single control plane to manage multiple data planes, and allows relocating Flink jobs to different Kubernetes clusters for cluster utilization and disaster recovery scenarios. With Debezium integration at its core, our reconciliation protocol guarantees efficiency and scalability.

In this talk, you will learn how we designed and implemented such a reconciliation protocol, including various reconciliation methods tailored to the unique demands of Flink.

The top 3 challenges running multi-tenant Flink at scale

Apache Flink is the foundation for Decodable's real-time SaaS data platform. Flink runs critical data processing jobs with strong security requirements. In addition, Decodable has to scale to thousands of tenants, power various use cases, provide an intuitive user experience and maintain cost-efficiency. We've learned a lot of lessons while building and maintaining the platform. In this talk, I'll share the top 3 toughest challenges building and operating this platform with Flink, and how we solved them.

Sharon Xie

Founding Engineer and Head of Product

Actions

Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.

Jump to top