Session
When GitOps Meets Streaming: Running Terabyte-Scale Flink on BYOC with Terraform, ArgoCD and Doppler
We run Apache Flink jobs processing millions of events per second with hundreds of gigabytes of RocksDB state across dozens of TaskManagers, all deployed through Git pull requests. No ClickOps. No manual Helm installs. No secrets leaving our VPC.
This talk is the story of how we built that setup from scratch for a BYOC (Bring Your Own Cloud) deployment of Ververica Platform on Amazon EKS, and every sharp edge we hit along the way.
The BYOC model promises the best of both worlds: managed control plane, your own infrastructure. But it also creates a gap that nobody talks about. You own the Kubernetes cluster, the networking, and the IAM roles. The vendor owns the agent, the CRDs, and the control plane. Traditional GitOps patterns assume full stack ownership, and they fall apart in this middle ground.
I will show you how we solved this with a production stack of Terraform for infrastructure provisioning (there is no official BYOC Terraform module, so we built our own), ArgoCD for continuous delivery of both the platform agent and Flink job deployments, and Doppler for secrets management after we discovered that the platform's built-in secrets mechanism was shipping credentials outside the customer's VPC.
In a live demo, you will see the full lifecycle: a Git commit triggers ArgoCD to deploy a Flink job, Doppler injects Kafka SASL credentials without them ever touching the vendor control plane, and Terraform manages the underlying EKS node groups that auto-scale based on state size. I will also show what a failed checkpoint recovery looks like in this setup and how GitOps-driven rollback gets you back to a healthy state in under three minutes.
You will walk away with a complete, open reference architecture you can fork and adapt for any BYOC or hybrid-managed Kubernetes deployment, whether you are running Flink, Kafka, or any vendor agent that lives inside your cluster.
Abdul Rehman is the Solutions Architecture Team Lead at Ververica, the company behind Apache Flink's commercial platform. He designs and deploys production Flink environments processing terabytes of state across AWS, Azure, and GCP, with a deep focus on Kubernetes-native architectures and BYOC deployments. Leading a team of Solutions Architects responsible for customer PoCs and production rollouts, Abdul has hands-on experience with the real scaling, security, and operational challenges that organizations hit when running managed stream processing at scale.
Abdul Rehman Zafar
Senior Solutions Architect at Ververica
Berlin, Germany
Links
Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.
Jump to top