Talat Uyarer

Senior Staff Software Engineer at Google

San Francisco, California, United States

Actions

Talat is a Senior Staff Software Engineer at Google and an Apache Member. He's an active contributor to open source projects and is passionate about building scalable data processing systems. Prior to Google, Talat worked on the Cortex Data Lake team at Palo Alto Networks, where he helped secure customers by developing a streaming data platform leveraging Apache Kafka, Apache Beam, and Apache Flink.

Area of Expertise

Information & Communications Technology

Topics

Apache Flink
Apache Kafka
Apache Spark
Apache Iceberg
Apache Beam
Apache Hive
Apache Arrow
Apache Software Foundation
Stream Analytics
Streaming Data Analytics
Debezium
Kafka Streams
stream processing
streaming sql
batch processing

Flink, We Have a Problem: A Post-Mortem on Self-Managed Streaming

In the fast-evolving landscape of real-time data processing, choosing the right stream processing framework is paramount. Two years ago, we embarked on an ambitious journey to migrate our entire data processing stack to Apache Flink, driven by the promise of ultimate flexibility, open-source community support, and granular control over our streaming infrastructure.

This talk chronicles our experience of operating a large-scale Flink deployment, managing dozens of complex pipelines processing terabytes of data daily. We will take a technical deep dive into the challenges we encountered at scale, including the operational overhead of managing Flink clusters on Kubernetes, the complexities of state management and checkpointing for massive stateful applications, and the nuances of performance tuning in a multi-tenant environment.

However, as our data ecosystem matured, the very control that drew us to Flink became a significant operational burden. This session will candidly explore the critical factors that led to our strategic decision to migrate back to a managed service, specifically Google Dataflow. We will present a comparative analysis of the two frameworks, focusing on total cost of ownership (TCO), developer productivity, and operational resilience.

Attendees will gain valuable insights into the trade-offs between self-managed and serverless stream processing. We'll share our migration playbook, covering everything from initial pipeline compatibility assessments to the execution of a zero-downtime cutover. This is a story of pragmatism over purity, offering practical lessons for any organization navigating the complex decision of which stream processing engine will truly allow them to sail, not just scale.

Unlocking No-Code Data Pipelines with Beam YAML: A Flink-Centric Exploration of Practical Use Cases

Apache Beam offers a powerful unified programming model, but its SDK-centric approach can sometimes present a steep learning curve for users less familiar with traditional coding or the intricacies of distributed systems. Enter Beam YAML: a game-changing declarative syntax that empowers users to define and execute complex data pipelines without writing a single line of code.

This session will provide a practical, Flink-centric exploration of Beam YAML, demonstrating how it significantly lowers the barrier to entry for data processing. We'll begin with an overview of Beam YAML's core concepts, illustrating its intuitive, human-readable structure for defining sources, transforms, and sinks. Attendees will discover how familiar Beam operations translate seamlessly into declarative YAML, enabling rapid pipeline prototyping and deployment.

Crucially, we'll dive into compelling, real-world use cases, showcasing how Beam YAML can be leveraged effectively on Apache Flink.

Attendees will leave with a clear understanding of when and how to leverage Beam YAML to accelerate their data initiatives on Flink. We'll discuss its benefits for data analysts, domain experts, and even seasoned engineers seeking to simplify and standardize pipeline definitions. Join us to discover how Beam YAML is making unified batch and stream processing more accessible and efficient than ever before.

New Avro serialization and deserialization in Beam SQL

At Palo Alto Networks we heavily rely on Avro, using it as the primary storage format and use Beam Row as in memory. We de/serialize billions Avro records per second. One day we realized Avro Row conversion routines consume much of CPU time. Then the story begins ....

Large scale streaming infrastructure with using Apache Beam and DataFlow

Cortex Data Lake collects, transforms and integrates your enterprise’s security data to enable Palo Alto Networks solutions. We build streaming infrastructure for our customers. I will share share our infrastructure and experience while building that infrastructure

Building Fully Managed Service for Beam Jobs with Flink on Kubernetes

At Palo Alto Networks, We are using Beam on Dataflow for 10K+ jobs. Beam has good abstraction run on multiple runner. For multi Cloud Provider use case We developed a fully managed stream processing platform on Flink running on K8s to power thousands of stream processing pipelines in production without changing our business code. This platform is the backbone for other infra systems like Real Time Analytics and Log processing to handle 10 Million rps.

We have provided a rich authoring and testing environment which allows users to create, test, and deploy their streaming jobs in a self-serve fashion within minutes with Dataflow. Now we provide similar functionality by building Beam Flink based platform on Kubernetes.

So Users can focus on their business logic, leaving the Beam platform to take care of management aspects such as resource provisioning, auto-scaling, job monitoring, alerting, failure recovery and much more on multi cloud platform.

In this talk, we will introduce the overall platform architecture, highlight the unique value propositions that it brings to stream processing at Palo Alto Networks and share the experiences and lessons we have learned while creating Beam Kubernetes based platform

Talat Uyarer

Senior Staff Software Engineer at Google

San Francisco, California, United States

Actions

Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.

Speaker

Talat Uyarer

Actions

Links

Area of Expertise

Topics

Sessions

Flink, We Have a Problem: A Post-Mortem on Self-Managed Streaming

Unlocking No-Code Data Pipelines with Beam YAML: A Flink-Centric Exploration of Practical Use Cases

New Avro serialization and deserialization in Beam SQL

Large scale streaming infrastructure with using Apache Beam and DataFlow

Building Fully Managed Service for Beam Jobs with Flink on Kubernetes

Talat Uyarer

Links

Actions