Steffen Hausmann

Solutions Engineer at Confluent

Munich, Germany

Actions

👋 I'm Steffen. I’m excited about streaming and real-time technologies.

In the past years, I've been working in different customer facing roles at startups like Materialize and as a Principal Streaming Architect as part of the Messaging and Streaming team at Amazon Web Services. I enjoy working closely with customers to help them adopt technologies to gain timely insights from and act on their continually changing data.

But my passion for streaming technologies started much earlier, when doing my PhD on Complex Event Processing at the University of Munich. I enjoy sharing my knowledge and speaking at conferences like Flink Forward, Kafka Summit, and AWS re:Invent. I've contributed to open source projects like Apache Flink in the past and in my free time, I’m trying to lure our daughters into tech with cute stickers I collect at events.

Area of Expertise

Information & Communications Technology

Topics

Data Streaming
stream processing
Realtime Analytics

Build and run streaming applications with Apache Flink and Amazon Kinesis Data Analytics

Stream processing facilitates the collection, processing, and analysis of real-time data and enables the continuous generation of insights and quick reactions to emerging situations. Yet, despite these advantages compared to traditional batch-oriented analytics applications, streaming applications are much more challenging to operate. Some of these challenges include the ability to provide and maintain low end-to-end latency, to seamlessly recover from failure, and to deal with a varying amount of throughput.

We all know and love Flink to take on those challenges with grace. In this session, we explore an end to end example that shows how you can use Apache Flink and Amazon Kinesis Data Analytics for Java Applications to build a reliable, scalable, and highly available streaming applications. We discuss how you can leverage managed services to quickly build Flink based streaming applications and show managed services can help to substantially reduce the operational overhead that is required to run the application. We also review best practices for running streaming applications with Apache Flink on AWS.

So you will not only see how to actually build streaming applications with Apache Flink on AWS, you will also learn how leveraging managed services can help to reduce the overhead that is usually required to build and operate streaming applications to a bare minimum.

https://www.youtube.com/watch?v=c03_TaW2pR0

Workshop: Build a Unified Batch and Stream Processing Pipeline with Apache Beam on AWS

In this workshop, we explore an end to end example that combines batch and streaming aspects in one uniform Beam pipeline. We start to analyze incoming taxi trip events in near real time with an Apache Beam pipeline. We then show how to archive the trip data to Amazon S3 for long term storage. We subsequently explain how to read the historic data from S3 and backfill new metrics by executing the same Beam pipeline in a batch fashion. Along the way, you also learn how you can deploy and execute the Beam pipeline with Amazon Kinesis Data Analytics in a fully managed environment.

So you will not only learn how you can leverage Beam’s expressive programming model to unify batch and streaming you will also learn how AWS can help you to effectively build and operate Beam based streaming architectures with low operational overhead.

https://www.youtube.com/watch?v=K6hVR-URTYU

Navigating private network connectivity options for Kafka clusters

There are various strategies for securely connecting to Kafka clusters between different networks or over the public internet. Many cloud providers even offer endpoints that privately route traffic between networks and are not exposed to the internet. But, depending on your network setup and how you are running Kafka, these options ... might not be an option!

In this session, we’ll discuss how you can use SSH bastions or a self managed PrivateLink endpoint to establish connectivity to your Kafka clusters without exposing brokers directly to the internet. We explain the required network configuration, and show how we at Materialize have contributed to librdkafka to simplify these scenarios and avoid fragile workarounds.

A Beginner’s Guide to Kafka Performance in Cloud Environments

Over time, deploying and running Kafka became easier and easier. Today you can choose amongst a large ecosystem of different managed offerings or just deploy to Kubernetes directly. But, although you have plenty of options to optimize your Kafka configuration and choose infrastructure that matches your use case and budget, it’s not always easy to tell how these choices affect overall cluster performance.

In this session, we’ll take a look at Kafka performance from an infrastructure perspective. How does your choice of storage, compute, and networking affect cluster throughput? How can you optimize for low cost or fast recovery? When is it better to scale up rather than to scale out brokers?

You’ll walk away from this session with a mental model that allows you to better understand the limits of your clusters. You can use this knowledge to make informed decisions on how to achieve the throughput, availability, and durability required for your use cases while optimizing infrastructure cost.

One sink to rule them all: Introducing the new Async Sink

Next time you want to integrate with a new destination for a demo, concept or production application, the Async Sink framework will bootstrap development, allowing you to move quickly without compromise. In Flink 1.15 we introduced the Async Sink base (FLIP-171), with the goal to encapsulate common logic and allow developers to focus on the key integration code. The new framework handles things like request batching, buffering records, applying backpressure, retry strategies, and at least once semantics. It allows you to focus on your business logic, rather than spending time integrating with your downstream consumers. During the session we will dive deep into the internals to uncover how it works, why it was designed this way, and how to use it. We will code up a new sink from scratch and demonstrate how to quickly push data to a destination. At the end of this talk you will be ready to start implementing your own Flink sink using the new Async Sink framework.

https://www.youtube.com/watch?v=z-hYuLgbHuo

Unify Batch and Stream Processing with Apache Beam on AWS

One of the big visions of Apache Beam is to provide a single programming model for both batch and streaming that runs on multiple execution engines.

In this session, we explore an end to end example that shows how you can combine batch and streaming aspects in one uniform Beam pipeline: We start with ingesting taxi trip events into an Amazon Kinesis data stream and use a Beam pipeline to analyze the streaming data in near real time. We then show how to archive the trip data to Amazon S3 and how we can extend and update the Beam pipeline to generate additional metrics from the streaming data moving forward. We subsequently explain how to backfill the added metrics by executing the same Beam pipeline in a batch fashion against the archived data in S3. Along the way we furthermore discuss how to leverage different execution engines, such as, Amazon Kinesis Data Analytics for Java and Amazon Elastic Map Reduce, to run Beam pipelines in a fully managed environment.

So you will not only learn how you can leverage Beam's expressive programming model to unify batch and streaming you will also learn how AWS can help you to effectively build and operate Beam based streaming architectures with low operational overhead.

https://www.youtube.com/watch?v=eCgZRJqdt_I

Build a Real-time Stream Processing Pipeline with Apache Flink on AWS

The increasing number of available data sources in today's application stacks created a demand to continuously capture and process data from various sources to quickly turn high volume streams of raw data into actionable insights. Apache Flink addresses many of the challenges faced in this domain as it's specifically tailored to distributed computations over streams. While Flink provides all the necessary capabilities to process streaming data, provisioning and maintaining a Flink cluster still requires considerable effort and expertise. We will discuss how cloud services can remove most of the burden of running the clusters underlying your Flink jobs and explain how to build a real-time processing pipeline on top of AWS by integrating Flink with Amazon Kinesis and Amazon EMR. We will furthermore illustrate how to leverage the reliable, scalable, and elastic nature of the AWS cloud to effectively create and operate your real-time processing pipeline with little operational overhead.

https://www.youtube.com/watch?v=tmdEe3jpUX8

Building real-time applications using Apache Flink

Apache Flink is a framework and engine for building streaming applications for use cases such as real-time analytics and complex event processing. This session covers best practices for building low-latency applications with Apache Flink when reading data from either Amazon MSK or Amazon Kinesis Data Streams. It also covers best practices for running low-latency Apache Flink applications using Amazon Kinesis Data Analytics and discusses AWS’s open-source contributions to this use case.

https://www.youtube.com/watch?v=xu3A_7DcRgQ

Build Your First Big Data Application on AWS

AWS makes it easy to build and operate a highly scalable and flexible data platforms to collect, process, and analyze data so you can get timely insights and react quickly to new information. In this session, we will demonstrate how you can quickly build a fully managed data platform that transforms, cleans, and analyses incoming data in real time and persist the cleaned data for subsequent visualizations and through exploration by means of SQL. To this end, we will build an end-to-end streaming data solution using Kinesis Data Streams for data ingestion, Kinesis Data Analytics for real-time outlier and hotspot detection, and show how the incoming data can be persisted by means of Kinesis Data Firehose to make it available for Amazon Athena and Amazon QuickSight for data exploration and visualization.

https://www.youtube.com/watch?v=Y-jIhPYW8Ms

Powering real-time loan underwriting at Vontive with Materialize

In the fast-paced world of mortgage lending, speed and accuracy are crucial. To support their underwriters, Vontive transformed written rules for loan eligibility from a Google Doc into SQL queries for evaluation in a Postgres database. However, while functional, this setup struggled to scale with business growth, resulting in slow, cumbersome processing times. Executing just a handful of loan eligibility rules could take up to 27 seconds–far too long for user-friendly interactions.

In this session, we’ll explore how Vontive reimagined its underwriting operations using Materialize. By offloading complex SQL queries from Postgres to Materialize, Vontive reduced eligibility check times from 27 seconds to under a second. This not only sped up decision-making but also removed limitations on the number of SQL-based underwriting rules, allowing underwriters to process more loans with greater accuracy and confidence. Additionally, this shift enabled the team to implement more automated checks throughout the underwriting process, catching errors earlier and further streamlining operations. Engineering needs were minimal, since DBT supports both cloud-based Postgres and Materialize. Whether you're in financial services or any data-driven industry, this session offers valuable insights into leveraging fast-changing data for high-stakes decision-making with confidence.

https://www.youtube.com/watch?v=qLBH1nMQAZ8

Coalesce 2024

Powering real-time loan underwriting at Vontive with Materialize

October 2024 Las Vegas, Nevada, United States

Kafka Summit London 2024 Sessionize Event

March 2024 London, United Kingdom

Kafka Summit London 2023 Sessionize Event

May 2023 London, United Kingdom

Steffen Hausmann

Solutions Engineer at Confluent

Munich, Germany

Actions

Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.

Speaker

Steffen Hausmann

Actions

Links

Area of Expertise

Topics

Sessions

Build and run streaming applications with Apache Flink and Amazon Kinesis Data Analytics

Workshop: Build a Unified Batch and Stream Processing Pipeline with Apache Beam on AWS

Navigating private network connectivity options for Kafka clusters

A Beginner’s Guide to Kafka Performance in Cloud Environments

One sink to rule them all: Introducing the new Async Sink

Unify Batch and Stream Processing with Apache Beam on AWS

Build a Real-time Stream Processing Pipeline with Apache Flink on AWS

Building real-time applications using Apache Flink

Build Your First Big Data Application on AWS

Powering real-time loan underwriting at Vontive with Materialize

Events

Coalesce 2024

Kafka Summit London 2024 Sessionize Event

Kafka Summit London 2023 Sessionize Event

Flink Forward San Francisco 2022

AWS re:Invent 2020

Beam Digital Summit 2020

AWS re:Invent 2019

Flink Forward Europe 2019

Beam Summit Europe 2019

AWS Summit Berlin 2019

Flink Forward Berlin 2017

Steffen Hausmann

Links

Actions