Session

Everything you wanted to know about Streaming Lakehouses but were afraid to ask.

Lakehouse represents a transformative approach to data management, merging the best attributes of data lakes and traditional data warehouses. It combines data lake scalability and cost-effectiveness with data warehouse reliability, structure, and performance.

In this talk, we will guide you through the process of building a data ingestion and transformation pipeline that allows you to stream data from the edge all the way to your streaming lakehouse using an entirely open-source technology stack. We will show you how easy it is to offload your streaming data to tiered storage in a Lakehouse native format, such as Delta Lake, Apache Hudi, and Apache Iceberg.

We will conclude by demonstrating how easy it is to query your lakehouse formatted data stream using query engines like Flink or Spark. Allowing you to analyze streaming data quickly and cost-effectively.

David Kjerrumgaard

Committer on the Apache Pulsar Project | Published Author | International Speaker | Big Data Expert

Las Vegas, Nevada, United States

Actions

Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.

Jump to top