Speaker

Sharon Xie

Sharon Xie

Founding Engineer and Product Lead

Actions

Sharon is a founding engineer at Decodable. Currently she leads product management and development. She has over six years of experience in building and operating streaming data platforms, with extensive expertise in Apache Kafka, Apache Flink, and Debezium. Before joining Decodable, she served as the technical lead for the real-time data platform at Splunk, where her focus was on the streaming query language and developer SDKs.

Comparing Apache Flink and Spark for Modern Stream Data Processing

Real-time data processing is essential for staying competitive in today’s fast-paced business environment, and choosing the right tool is a key decision. Apache Flink and Spark Structured Streaming are two leading stream processing frameworks, each with unique strengths and trade-offs.

This talk takes a look at our journey at Decodable, where we evaluated both tools and ultimately chose Apache Flink over Spark Structured Streaming for our stream data processing needs. By examining key differences between the two systems, we aim to provide a clear, technical comparison that will help you make informed decisions for your streaming data use cases.

Join us for this talk where we will discuss:
1. Design philosophies: Learn about the origins of both systems and some of the fundamental architecture design choices of Flink that makes it more attractive for streaming use cases.
2. (Stateful) streaming capabilities: We will dive into and compare similar features that both Spark and Flink offer in the various APIs, we will also share some features only available in Flink that make it a much richer streaming library. We will also talk about some of the data ecosystem tools/connectors that Flink supports natively, like Debezium.
3. Production readiness: We will also talk about some of the recent features of Flink that makes running Flink at scale easy, like the Kubernetes operator and its sophisticated auto-scaler.

Timing is Everything: Understanding Event-Time Processing in Flink SQL

In the stream processing context, event-time processing means the events are processed based on when the events occurred, rather than when the events are observed (processing-time) in the system. Apache Flink has a powerful framework for event-time processing, which plays a pivotal role in ensuring temporal order and result accuracy.

In this talk, we will introduce Flink event-time semantics and demonstrate how watermarks as a means of handling late-arriving events are generated, propagated, and triggered using Flink SQL. We will explore operators such as window and join that are often used with event time processing, and how different configurations can impact the processing speed, cost and correctness.

Join us for this exploration where event-time theory meets practical SQL implementation, providing you with the tools to make informed decisions for making optimal trade-offs.

Kubernetes-like Reconciliation Protocol for Managed Flink Services

Want your Flink jobs to keep running without failures? Inspired by the robustness of Kubernetes, we created a managed Flink service that brings a similar experience. Users specify the desired Flink job states, and our platform ensures Flink jobs remain in that state. We embraced Kubernetes style reconciliation loops - constant monitoring, comparison of actual and desired states, and proactive actions to resolve any issues.

We've diverged from the conventional Kubernetes operator approach. Our implementation enables a single control plane to manage multiple data planes, and allows relocating Flink jobs to different Kubernetes clusters for cluster utilization and disaster recovery scenarios. With Debezium integration at its core, our reconciliation protocol guarantees efficiency and scalability.

In this talk, you will learn how we designed and implemented such a reconciliation protocol, including various reconciliation methods tailored to the unique demands of Flink.

Maximizing Efficiency in Apache Flink: Advanced Tuning Techniques for Optimal Resource Use

In this talk, we will dive into the heart of Apache Flink tuning, equipping attendees with practical strategies to optimize resource utilization. We'll discuss Flink's architecture, performance metrics and provide insights into advanced monitoring tools to identify bottlenecks. We will then provide strategies for memory management, task parallelism, state backend, checkpointing, slot sharing, and network considerations, showcasing how tweaks in these parameters can lead to substantial improvements in resource utilization.

The top 3 challenges running multi-tenant Flink at scale

Apache Flink is the foundation for Decodable's real-time SaaS data platform. Flink runs critical data processing jobs with strong security requirements. In addition, Decodable has to scale to thousands of tenants, power various use cases, provide an intuitive user experience and maintain cost-efficiency. We've learned a lot of lessons while building and maintaining the platform. In this talk, I'll share the top 3 toughest challenges building and operating this platform with Flink, and how we solved them.

Sharon Xie

Founding Engineer and Product Lead

Actions

Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.

Jump to top