
Hakan Lofcali
co-founder, CTO @ DataCater, previously AWS and ING Analytics
Düsseldorf, Germany
Actions
Hakan is passionate about data platforms and companies delivering better products to their customers, faster. He worked and built his knowledge around Software, Data Engineering, and Cloud-Native Computing in different environments. From early start-up to AWS. From sports media companies to highly regulated FSI enterprises. The experiences gained, problems encountered, and solutions found led him to co-found DataCater to enhance tooling in the Data space.
Area of Expertise
Topics
Unlocking A Harder, Better, Stronger Kafka Connect With Kubernetes
Kafka Connect executes multiple connectors in the same process and does not offer any mechanism for the isolation of connectors. Multiple connectors share the same resources (vCPUs, MEM, etc.), which has the unfortunate side effect that excessive resource requests of single connectors impact the health of the rest of the Kafka Connect cluster.
This talk proposes a novel, cloud-native deployment model for Kafka Connect, which uses the different concepts of Kubernetes for executing, scaling, and isolating single Kafka Connect connectors. In a nutshell, we build unique container images for each Kafka Connect connector type. We run connectors as Kubernetes Deployments, which allows us to either set the number of connector instances (or tasks) manually or let Kubernetes scale the connectors elastically. We use Kubernetes’ Resource Management for declaring the resource requests and limits of single connector instances. As a consequence, we achieve fully self-contained connectors, a necessity for production deployments of Kafka Connect.
In a comprehensive evaluation, we compared the presented approach with a Strimzi-based deployment of Kafka Connect. We discuss the results and highlight the benefits and disadvantages of the presented approach (there's no free lunch!). We answer questions, such as: What’s the impact of running single-connector clusters on the overall resource consumption? How well does the elastic scaling of connectors work? Can single connectors go rogue without having an impact on the rest of the cluster?
Learnings From Shipping 1000+ Streaming Data Pipelines To Production
Kafka Connect and Kafka Streams are foundational technologies in modern, real-time data architectures. They enable developers to build scalable, robust, and real-time data pipelines without having to handle the low-level consumer and producer APIs of Apache Kafka. In this talk, we share our most important, and often surprising learnings from using Kafka Connect and Kafka Streams to ship more than 1,000 streaming data pipelines to production. The goal of this talk is to enable you to build mature streaming data pipelines without having to go through the common pitfalls.
We walk you through our journey of adopting Apache Kafka, Kafka Connect, and Kafka Streams. We discuss the challenges that we faced and how we overcame them. Over the course of the talk, we provide answers to important questions, such as: Which metrics are useful for monitoring streaming data pipelines? How to deal with resource-leaking connectors impacting the health of a Kafka Connect cluster? How to start troubleshooting the performance of streaming data pipelines? How to tune Kafka Connect for handling slow data sources or data sinks? What’s missing in today’s ecosystem for streaming to become a commodity?
Cloud-native ETL with Java Quarkus, Kubernetes, and Jib Container Builder
DataCater unlocks more value from organizations' data, faster. This talk walks you through our stack, architecture, and processes. We develop tools to deploy and run data-driven applications in a cloud-native environment.
We will give a whirlwind tour on developing a java Quarkus application, a CICD stack powered by Github Actions / ArgoCD, building and deploying containerized Kafka Streams applications at runtime with Jib container builder.
Having introduced the above common understanding, we will give a high-level overview of how we utilize modern Kubernetes and Cloud tooling to manage multiple clusters in different organizations together with our customers.
Kubernetes, Container, Streaming, Python
Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.
Jump to top