Session
Generative AI for Streaming Data Platforms - State of the Union
A classic data lakehouse is built on open-source table formats such as Delta.io, Iceberg, or Hudi and seamlessly integrates with big data platforms like Apache Spark and event busses like Apache Kafka. The popularity of the data lakehouse stems from its ability to combine the quality, speed, and simple SQL access of data warehouses with the cost-effectiveness, scalability, and support for unstructured data of data lakes. The success of the lakehouse OSS approach is driven by its low TCO and highlighted by its adoption by industry giants such as Amazon, Microsoft, Oracle, and Databricks.
With the advent of generative AI models and the potential of using techniques such as Retrieval-augmented generation (RAG) in combination with fine-tuning or pre-training custom LLMs, a new paradigm has emerged in 2023: AI-infused lakehouses. These platforms use generative AI for code generation, natural language queries, and semantic search, enhancing governance and automating documentation.
How do lakehouses, which are inherently capable of managing streaming data, adapt to the integration of new AI capabilities? Is AI in this context simply hype and marketing terminology, or is it a technology that – despite initial skepticism due to its catchy name (similar to terms like 'cloud computing', 'serverless', or 'lakehouses') – is already on the way to become widely adopted and transformative in the field?
Be surprised, join my lightning talk, and discover how AI capabilities can enhance real-time analytics and streamline ETL. Expect an interactive, hands-on, no-nonsense demonstration using Apache Kafka and the NY Taxi data set from Kaggle, concentrating on developer experience, operations, and governance.
Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.
Jump to top