Session

Turbo Charge your Lake House with Spark Streaming on Azure Databricks

Streaming is one of the buzzwords used when talking about the Lakehouse. It promises to give us real time analytics by enabling a continual flow of data into our analytics platforms. It's being used to power real time processes as diverse as fraud detection, recommendation engines, stock trading, GPS tracking and social media feeds. However, for data engineers used to working with batch jobs this can be a big paradigm shift.

In this session we take a look at Spark Structured Streaming:
- How is it architected
- What can ingest
- How it handles state and late arriving data
- What is the latency and performance
- Stateless vs stateful joins

At the end of the session you'll have a good idea of what the hype around streaming actually means for your pipelines - can you improve latency and resiliency or reduce costs by implementing streaming pipelines.

50 Minute Session

Niall Langley

Data Engineer / Platform Architect

Bristol, United Kingdom

Actions

Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.

Jump to top