Most Active Speaker

David Kjerrumgaard

David Kjerrumgaard

Committer on the Apache Pulsar Project | Published Author | International Speaker | Big Data Expert

Las Vegas, Nevada, United States

Actions

David is a globally recognized expert in the world of real-time data, messaging systems, and Big Data technologies. He is a distinguished committer on the Apache Pulsar project, showcasing his invaluable contributions to the advancement of this groundbreaking event streaming platform.

As an accomplished author, David has penned the highly regarded book "Pulsar in Action," which serves as a definitive guide for those seeking to master the intricacies of Apache Pulsar. He is also a co-author of "Practical Hive," adding to his list of notable publications.

With numerous speaking engagements around the world, David's influence and expertise extend far beyond his written work. His captivating and informative talks have enlightened audiences on a global scale, making him a sought-after authority on topics related to real-time data and messaging technologies.

In his current role as a Developer Advocate for StreamNative, David focuses on strengthening the Apache Pulsar community through his passion for education and evangelization. He is dedicated to empowering individuals and organizations with the knowledge and skills they need to make the most of real-time data and messaging technologies.

Prior to his role at StreamNative, David held key positions in leading companies. He was a principal software engineer on the messaging team at Splunk, where he honed his expertise in real-time data analytics. Additionally, David has served as the Director of Solutions for two influential Big Data startups, Streamlio and Hortonworks, further solidifying his position as a thought leader in the industry.

David's extensive experience, contributions to open-source projects, commitment to educating and supporting the community, and his extensive international speaking engagements make him a prominent figure in the field of real-time data. His presentations are always enlightening and serve as a valuable resource for those navigating the complexities of modern data and messaging systems.

Awards

  • Most Active Speaker 2024

Area of Expertise

  • Information & Communications Technology

Topics

  • Apache Pulsar
  • Apache BookKeeper
  • Apache Hive
  • apache nifi
  • Apache Storm
  • Event Sourcing
  • CQRS & Event Sourcing
  • Event Streaming
  • Event Driven Architecture
  • Distributed Software Systems
  • Realtime Analytics
  • java
  • stream processing
  • Reactive Programming
  • Microservice Architecture
  • IoT
  • Iot Edge

Getting started with serverless: A hands-on guide to using the Pulsar Function Mesh

Function Mesh is a serverless Function-as-a-service (FaaS) framework purpose-built for stream processing applications. It brings powerful event-processing capabilities to your applications by orchestrating multiple Pulsar Functions and Pulsar IO connectors for complex stream processing jobs.

In this talk, we will demonstrate how to utilize Pulsar's Function Mesh to deploy an application comprised entirely of interconnected Pulsar Functions. I will cover the steps necessary to configure and deploy Pulsar Functions inside the Function Mesh and demonstrate more advanced capabilities, such as auto-scaling.

Four Stream Processing Patterns Every Developer Should Know

Stream processing has become increasingly important in today's world of real-time data and event-driven applications. It offers the ability to process and analyze data as it flows through a system, enabling developers to build responsive, scalable, and data-driven solutions. However, harnessing the full power of stream processing requires a solid understanding of fundamental patterns and best practices.

This talk, we will introduce four essential stream processing patterns that every developer should be acquainted with. These patterns have emerged as critical tools for building efficient, fault-tolerant, and scalable stream processing systems. Whether you're new to stream processing or looking to deepen your expertise, this talk will provide valuable insights and practical guidance.

An Introduction to Apache Pulsar: Powering Real-time Data Streaming

Join me in this informative session as we provide a comprehensive introduction to Apache Pulsar, a cutting-edge distributed messaging and event streaming platform.

Discover the fundamental concepts behind Apache Pulsar's architecture and its practical applications in real-time data streaming. Learn about its core concepts such as topics and subscriptions, persistence and durability, scalability, multi-tenancy, connectivity with other data processing frameworks, and real-world use cases.

Whether you're a developer, architect, or tech enthusiast, this talk will equip you with insights into Apache Pulsar's role in creating efficient, scalable, and real-time data streaming solutions.

Introducing TableView: Pulsar's database table abstraction

In many use cases, applications are using Pulsar consumers or readers to fetch all the updates from a topic and construct a map with the latest value of each key for the messages that were received.

The new TableView consumer offers support for this access pattern directly in the Pulsar client API itself, and encapsulate the complexities of manually constructing such local cache manually. In this talk, we will demonstrate how to use the new TableView consumer using a simple application and discuss best practices and patterns for using the TableView consumer.

Sink Your Teeth into Streaming at Any Scale

Using low-latency Apache Pulsar we can build up millions of event streams of concurrent data and join them in real time with Apache Flink. Of course we need an ultra-low latency database that can support these workloads to build next-generation IoT, financial and instant analytical transit applications.

By sinking data into ScyllaDB we enable amazingly fast applications that can grow to any size and join with existing data sources.

The next generation of apps is being built now. You must choose the right low-latency scalable platform for these massively data-intensive applications. We'll present a reference architecture based on ScyllaDB + Pulsar + Flink for real-time event streaming analytics.

Let's Go Build Cloud Native Pulsar Apps with Java

Java is a powerful and clean language to build cloud native applications. We can quickly spin up what resources we will need with devops tools and containers. Then build and run from where ever we need to. Apache Pulsar has a rich library for Java apps and we will show you how to build these apps to run on everything from Linux to Mac to NVIDIA devices with ease. Let's start streaming in real-time the Cloud Native way.

Unlocking the Power of Polyglot Messaging with Apache Pulsar and Spring

Have you been waiting for someone to create a unified platform and framework that allows you to develop messaging applications supporting multiple protocols, such as MQTT, AMQP, and Kafka, simplifying your messaging infrastructure and enhancing its interoperability and flexibility?

Apache Pulsar and Spring are two powerful technologies that can be used to build modern, distributed applications. While Apache Pulsar provides a scalable and flexible messaging platform with multi-protocol support, Spring provides rich features for building enterprise-grade applications. In this talk, we will explore how to use Apache Pulsar and Spring together to build polyglot messaging applications that communicate with different messaging systems and applications using different protocols and languages.

We will then dive into the details of Apache Pulsar's multi-protocol capability and how it can be used with the new Pulsar Spring Boot Starter library to build polyglot messaging applications. Next, we will demonstrate a real-world polyglot messaging application that uses different messaging protocols to exchange messages on Apache Pulsar.

Building Scalable and Resilient Event-Driven Applications with Apache Pulsar

Event-driven architecture (EDA) has gained significant popularity in recent years for building real-time and scalable applications. Apache Pulsar is a distributed pub-sub messaging system designed for building EDA applications. In this talk, we will explore how Apache Pulsar can be used to build scalable and resilient event-driven applications.

We will start by introducing the key concepts of EDA and the benefits of using an event-driven approach for building modern applications. We will then dive into the core features of Apache Pulsar, including its distributed architecture, support for multiple messaging models, and built-in features for message persistence and replication.

Next, we will discuss best practices for designing event-driven applications with Apache Pulsar, including strategies for managing message routing, handling failure scenarios, and integrating with other distributed systems. We will also cover common use cases for Apache Pulsar, such as real-time stream processing, data ingestion, and message queuing.

Finally, we will demonstrate how to use Apache Pulsar to build a sample event-driven application that showcases the power and flexibility of the platform. Attendees will leave this talk with a solid understanding of Apache Pulsar and how it can be used to build scalable and resilient event-driven applications.

Key Takeaways:

Understanding of event-driven architecture and its benefits
Knowledge of Apache Pulsar and its core features
Best practices for designing event-driven applications with Apache Pulsar
Common use cases for Apache Pulsar
Practical experience building an event-driven application with Apache Pulsar

Towards a ZooKeeper-less Pulsar, etcd, etcd, etcd.

Starting with version 2.10, the Apache ZooKeeper dependency has been eliminated and been replaced with a pluggable framework that enables you to reduce the infrastructure footprint of Apache Pulsar by leveraging alternative metadata and coordination systems based upon your deployment environment.

In this talk, I will walk you through the steps required to utilize the existing etcd service running inside Kubernetes to act as Pulsar's metadata store. Thereby eliminating the need to run ZooKeeper entirely, leaving you with a Zookeeper-less Pulsar.

Message Redelivery: An unexpected journey

Apache Pulsar depends upon message acknowledgments to provide at-least-once or exactly-once processing guarantees. With these guarantees, any transmission between the broker, producers, and consumers requires acknowledgment.

But what happens if an acknowledgment is not received? Resending the message introduces the potential of duplicate processing and increases the likelihood of out or order processing. Therefore, it is critical to understand the Pulsar message redelivery semantics in order to prevent either of these conditions.

In this talk, we will walk you through the redelivery semantics of Apache Pulsar and highlight some of the control mechanisms available to application developers to control this behavior. Finally, we will present best practices for configuring message redelivery to suit various use cases.

Failure is not an option

Developing a highly-available application requires more than just utilizing fault-tolerant services such as Apache Pulsar in your software stack. It also requires immediate failure detection and resolution including built-in failover when there are data center outages.

Up until now, Pulsar clients could only interact with a single Pulsar cluster and were unable to detect and respond to a cluster-level failure event. In the event of a complete cluster failure, these clients cannot reroute their messages to a secondary/standby cluster automatically.

With the release of Pulsar 2.10, this much-needed automated cluster failover capability has been added to the Pulsar client libraries. In this talk, I will walk you through the changes you need to make inside your application code to take advantage of this new capability.

FaaS without the furious

Function Mesh is a serverless Function-as-a-service (FaaS) framework purpose-built for stream processing applications. It brings powerful event-streaming capabilities to your applications by orchestrating multiple Pulsar Functions and Pulsar IO connectors for complex stream processing jobs.

In this talk, we will demonstrate how to utilize the Function Mesh to deploy an application that consists entirely of inter-connected Pulsar Functions. We will cover the steps necessary to configure and deploy Pulsar Functions inside the Function Mesh, and demonstrate the some of the more advanced capabilities such as auto-scaling.

Event Sourcing with Apache Flink

The event sourcing pattern is a well-known pattern is gaining widespread adoption among microservices developers as a way of adhering to the "database per service" principle.

Event sourcing replies upon an append-only store to record the entire sequence of actions taken upon system rather than a more traditional relational database. It is often combined with the CQRS that uses materialized views to convert event-based data from one (or more) sources into a format that is more suitable for querying.

In this talk, we will build a simple microservice application that demonstrates the benefits of combining Flink SQL with Apache Pulsar to pre-calculate these materialized views and continuously update them when new events arrive.

Using the FLiPN Stack for Edge AI (Flink, NiFi, Pulsar)

Introducing the FLiPN stack which combines Apache Flink, Apache NiFi, Apache Pulsar and other Apache tools to build fast applications for IoT, AI, rapid ingest.

FLiPN provides a quick set of tools to build applications at any scale for any streaming and IoT use cases.

Tools
Apache Flink, Apache Pulsar, Apache NiFi, MiNiFi, Apache MXNet, DJL.AI

References
https://www.datainmotion.dev/2019/08/rapid-iot-development-with-cloudera.html
https://www.datainmotion.dev/2019/09/powering-edge-ai-for-sensor-reading.html
https://www.datainmotion.dev/2019/05/dataworks-summit-dc-2019-report.html
https://www.datainmotion.dev/2019/03/using-raspberry-pi-3b-with-apache-nifi.html

Apache Pulsar with MQTT for Edge Computing

Today we will span from edge to any and all clouds to support data collection, real-time streaming, sensor ingest, edge computing, IoT use cases and edge AI. Apache Pulsar allows us to build computing at the edge and produce and consume messages at scale in any IoT, hybrid or cloud environment. Apache Pulsar supports MoP which allows for MQTT protocol to be used for high speed messaging.

We will teach you to quickly build scalable open source streaming applications regardless of if you are running in containers, pods, edge devices, VMs, on-premise servers, moving vehicles and any cloud.

Deploying Machine Learning Models with Pulsar Functions

In this talk I will present a technique for deploying machine learning models to provide real-time predictions using Apache Pulsar Functions. In order to provide a prediction in real-time, the model usually receives a single data point from the caller, and is expected to provide an accurate prediction within a few milliseconds. 

Throughout this talk, I will demonstrate the steps required to deploy a fully-trained ML that predicts the delivery time for a food delivery service based upon real-time traffic information, the customer's location, and the restaurant that will be fulfilling the order.

Using Apache Pulsar to Provide Real-time IoT analytics on the Edge

The business value of data decreases rapidly after it is created, particularly in use cases such as fraud detection, cybersecurity, and real-time system monitoring. The high-volume, high-velocity datasets used to feed these use cases often contain valuable, but perishable, insights that must be acted upon immediately.

In order to maximize the value of their data and reduce their decision latency, enterprises must fundamentally change their approach to processing real-time data by focusing on the perishability of the insights that they are deriving from their real-time data streams.

Generating timely insights in a high-volume, high-velocity data environment is challenging for a multitude of reasons. As the volume of data increases, so does the amount of time required to transmit it back to the datacenter and process it. Secondly, as the velocity of the data increases, the faster the data and the insights derived from it diminish in value.

In this talk, we will present a solution based on Apache Pulsar Functions that significantly reduces decision latency by using probabilistic algorithms to perform analytic calculations on the edge.

Event Sourcing with Apache Pulsar and Apache Quarkus

I believe that event-sourcing is the best way to implement persistence within a microservices architecture, but it hasn't always been the easiest solution to implement. In this talk, I will demonstrate how these two exciting technologies can be combined into one killer stack that simplifies event sourcing development.

I will outline how to use DDD and CQRS concepts as a guide for developing an event sourcing food-delivery application based on Apache Pulsar and Quarkus that is 100% cloud native.

Throughout this talk, I will demonstrate several different event sourcing design patterns across multiple microservices to feed multiple real-time dashboards that provide driver location tracking, and heatmaps. I will also highlight some patterns for using an event streaming platform as your event store.

A hands-on guide for developers looking to implement event sourcing.

Unlocking the Power of Polyglot Messaging with Apache Pulsar and Spring

Have you been waiting for someone to create a unified platform and framework that allows you to develop messaging applications supporting multiple protocols, such as MQTT, AMQP, and Kafka, simplifying your messaging infrastructure and enhancing its interoperability and flexibility?

Apache Pulsar and Spring are two powerful technologies that can be used to build modern, distributed applications. While Apache Pulsar provides a scalable and flexible messaging platform with multi-protocol support, Spring provides rich features for building enterprise-grade applications. In this talk, we will explore how to use Apache Pulsar and Spring together to build polyglot messaging applications that communicate with different messaging systems and applications using different protocols and languages.

We will then dive into the details of Apache Pulsar's multi-protocol capability and how it can be used with the new Pulsar Spring Boot Starter library to build polyglot messaging applications. Next, we will demonstrate a real-world polyglot messaging application that uses different messaging protocols to exchange messages on Apache Pulsar.

By the end of this talk, you will have a deeper understanding of how to develop messaging applications using Apache Pulsar and Spring Framework and how the Pulsar Spring Boot Starter can streamline your messaging infrastructure and improve the interoperability and flexibility of your messaging systems.

Join us to learn how to leverage the power of Spring Framework and Apache Pulsar to build robust and scalable messaging applications that support multiple protocols.

Building Reactive Applications with Apache Pulsar and Reactive Streams

Building reactive applications is crucial for delivering real-time and event-driven experiences in today's fast-paced and highly responsive digital landscape. Join this interactive session to learn how to build reactive applications using Apache Pulsar and Reactive Streams and witness live demonstrations showcasing their combined power.

By the end of this session, you will clearly understand how Apache Pulsar and Reactive Streams can be seamlessly integrated to build reactive applications. You will be equipped with practical knowledge, code samples, and hands-on experience, enabling you to apply these concepts to your projects.

Join us for this immersive session with live demonstrations, practical examples, and interactive discussions. Unlock the potential of building reactive applications with Apache Pulsar and Reactive Streams, and learn how to create highly responsive, event-driven systems that can meet the demands of modern applications.

Event Sourcing, the good, the bad, the ugly

Event sourcing is a well-known pattern gaining widespread adoption among microservices developers as a way of adhering to the "database per service" principle. It relies upon an append-only store to record the entire sequence of actions taken upon the system rather than a more traditional relational database. It is often combined with the CQRS, which uses materialized views to convert event-based data from one (or more) sources into a format that is more suitable for querying.

In this talk, we will present different approaches to calculating these materialized views using a simple microservice application that demonstrates the various techniques. We will discuss the pros and cons of each approach and highlight some of the common pitfalls and best practices we have learned from implementing these patterns at scale.

Error Handling Patterns in Pulsar

In this talk, I will introduce different ways to handle errors and message retries in your event streaming applications. Apache Pulsar provides several built-in mechanisms to handle processing failures, including negative acknowledgments, retry topics, dead-letter queues, etc. This proliferation of options can often lead to confusion as to which mechanism is best suited for handling errors in your application.

This hands-on talk presents a collection of error-handling patterns that maintain in-order processing of messages in the event of failure, covering different approaches as they gradually increase in complexity. Working code examples of the patterns will demonstrate these concepts throughout the talk.

At the end of this talk, you will be better equipped to properly handle errors within your Pulsar application by leveraging the patterns introduced. Concrete implementations of these patterns will also be made available after the talk.

Apache Pulsar: Why It's Time to Move Beyond Apache Kafka

Apache Kafka has been a dominant player in the world of real-time data streaming for years, but it's time to consider alternatives. Apache Pulsar has emerged as a powerful and versatile messaging system that offers distinct advantages over Kafka. This talk will explore why Apache Pulsar is better than Apache Kafka for real-time data streaming.

This talk will provide an overview of Apache Pulsar and Apache Kafka and explain the differences between the two platforms. We will then explore the advantages of using Apache Pulsar over Apache Kafka, including better scalability, multi-tenancy, tiered storage, and geo-replication.

We will explore how Apache Pulsar's architecture is designed to easily handle massive data volumes and provide multi-tenancy, enabling multiple departments within your organization to share a single cluster without affecting each other's performance.

We will delve into the more extensive messaging features Apache Pulsar provides, such as support for work queue messaging semantics, key-shared subscriptions, and negative acknowledgments, which are not present in Apache Kafka. Additionally, we will explore Apache Pulsar's support for multiple messaging protocols, including Kafka, MQTT, and AMQP, enabling seamless integration with existing systems.

Finally, we will discuss some real-world use cases of Apache Pulsar, such as stream processing, microservices communication, and real-time analytics, showcasing the benefits that Apache Pulsar can bring to these applications.

Attendees will leave this talk with a clear understanding of why Apache Pulsar is a better choice than Apache Kafka for real-time data streaming. They will gain practical knowledge of how to use Apache Pulsar and how to migrate from Apache Kafka to Apache Pulsar. This talk is suitable for developers, architects, and data engineers interested in exploring alternatives to Apache Kafka and building scalable, reliable, and efficient real-time data streaming systems.

Multi-Protocol Support for Stream Processing

Apache Pulsar has emerged as a leading open-source messaging system designed to handle real-time data streams at scale. Among its many features include its support for multiple messaging protocols, making it a versatile platform for modern data architectures. With Pulsar's multi-protocol architecture, users can leverage a wide range of messaging and streaming protocols, including MQTT, AMQP, and Kafka.

I will discuss how Apache Pulsar's multi-protocol support enables users to easily migrate from legacy messaging systems, such as RabbitMQ or Kafka, to a modern, cloud-native platform without modifying their legacy application code.

Next, I will demonstrate how a single Apache Pulsar instance can simultaneously support different messaging protocols, including Kafka, MQTT, and AMQP. I will also highlight Pulsar’s ability to allow clients using different protocols to communicate with one another seamlessly.

By the end of this presentation, attendees will have a thorough understanding of how Apache Pulsar's multi-protocol support can help them achieve seamless integration with a wide range of messaging and streaming protocols.

Segment-Based Storage vs. Partition-Based Storage: Which is Better for Real-Time Data Streaming?

Storage is a critical component of any real-time data streaming system, and the choice of storage model can significantly affect the system's performance, scalability, and reliability. Two popular storage models for real-time data streaming systems are segment- and partition-based storage.

In this talk, we will start by explaining what segment-based and partition-based storage means and how they work. We will explore the differences between the two storage models, including how data is organized, how data is stored, and how data is accessed.

We will discuss how a segment-based storage model provides better scalability, performance, and reliability than the partition-based model and how segment-based storage solves some deficiencies of the partition-based model, including the need to re-partition topics just to increase the storage capacity of a topic.

Attendees will leave this talk with a clear understanding of the differences between segment- and partition-based storage and how they affect real-time data streaming systems' performance, scalability, and resiliency.

Kafka-on-Pulsar: The Easy Way to Kafka.

Apache Kafka has been a dominant player in the world of real-time data streaming for years. Still, it is not without many well-known issues with scalability, data resiliency, and the outages associated with topic rebalancing. What if there was an easier way?

If you have invested heavily in application development around Apache Kafka, you may feel that migrating your existing code base to another platform is cost-prohibitive. In this talk, I will show you this is not the case!

Kafka-on-Pulsar (KoP) is an open-source project that provides a Kafka API interface on top of Apache Pulsar, allowing users to leverage Pulsar's scalability, and performance while maintaining Kafka binary compatibility within your applications. This makes migrating your Kafka applications to Pulsar easier and more seamless than ever.

I will demonstrate how KoP makes migration from Kafka to Pulsar easier and more straightforward and highlight the critical features of KoP, including its compatibility with the Kafka client APIs, Kafka connectors, KStreams, and Schema Registry.

Next, I will provide real-world examples of how KoP has helped organizations migrate their Kafka applications to Pulsar with minimal disruption. I will also discuss how Pulsar’s two-tiered architecture allows it to scale seamlessly without the pain of topic rebalancing and some of the additional benefits of Apache Pulsar, including built-in geo-replication and multi-protocol support.

By the end of this presentation, attendees will have a thorough understanding of how Kafka-on-Pulsar can make migration from Kafka to Pulsar easier, faster, and more efficient. They will learn how to leverage KoP to migrate their Kafka applications to Pulsar with minimal disruption while benefiting from its superior performance, scalability, and multi-protocol support.

From Zero to Real-Time Streaming Hero: A Hands-On Developer Workshop

In today's fast-paced digital landscape, the ability to process and deliver data in real time is paramount for crafting responsive and interactive applications. This workshop is designed to equip participants with the skills to harness the power of real-time streaming, enabling them to create applications that deliver instant, data-driven experiences.

Workshop Highlights:

In-Depth Understanding of Real-Time Principles: Delve into the foundational concepts of real-time data processing and explore the event-driven paradigm, gaining a strong grasp of how to create applications that respond to data instantaneously.

Hands-On Implementation: Through interactive exercises, participants will gain practical experience in setting up data pipelines, configuring streams, and employing tools for efficient data ingestion and transformation.

Seamless Application Integration: Learn how to seamlessly integrate real-time streaming capabilities into applications, enabling them to adapt and react in real time to incoming data and events.

Scaling for Optimal Performance: Discover strategies for optimizing applications to ensure high performance, reliability, and scalability as the volume of data increases.

Best Practices and Use Cases: Explore real-world use cases that showcase the transformative impact of real-time streaming across industries, and learn best practices for designing effective streaming solutions.

Guidance from Industry Experts: Benefit from the knowledge and experience of our seasoned facilitators who will provide guidance, insights, and one-on-one assistance throughout the workshop.

Whether you're an experienced developer aiming to elevate your skill set or a newcomer eager to explore the world of real-time streaming, this workshop caters to diverse skill levels. By the workshop's end, participants will have a solid understanding of real-time streaming principles, the confidence to implement real-time data processing, and the capability to craft applications that redefine responsiveness. Don't miss this opportunity to unlock the potential of real-time streaming and become a hero in delivering dynamic, instantaneous experiences.

Developing Cloud-Native Streaming Applications with Apache Pulsar

Embrace the power of real-time data processing with Apache Pulsar, a next-generation cloud-native messaging and event streaming platform. This session offers an immersive exploration into the world of building cutting-edge streaming applications using Apache Pulsar.

The talk will commence with an overview of the key principles behind cloud-native architecture, emphasizing the role of Apache Pulsar in delivering seamless, scalable, and resilient streaming solutions. Attendees will gain insights into the core concepts of event-driven architectures, microservices, and containerization tailored specifically for Apache Pulsar.

We will delve into the unique features and capabilities that make Apache Pulsar an ideal choice for developing cloud-native streaming applications. From its architecture that supports multi-tenancy and horizontal scalability to its ability to handle high-throughput, low-latency data streams, learn how Apache Pulsar simplifies the complexities of real-time data processing.

Practical demonstrations will guide participants through the process of building cloud-native streaming applications with Apache Pulsar. Topics covered include setting up Pulsar clusters, leveraging Pulsar Functions for stream processing, and ensuring fault tolerance in distributed environments.

Whether you're a seasoned developer or new to the world of streaming applications, this session aims to provide actionable insights, best practices, and real-world use cases that showcase the efficacy of Apache Pulsar. By the end of the talk, attendees will be equipped with the knowledge and tools needed to embark on their journey of developing cloud-native streaming applications with confidence and efficiency.

Join us to unravel the potential of Apache Pulsar and discover how it can elevate your streaming application development experience in this exciting era of cloud-native computing.

Oxia - A Horizontally Scalable Alternative to Apache Zookeeper.

For over a decade, Apache Zookeeper has played a crucial role in maintaining configuration information and providing synchronization within distributed systems. Its unique ability to provide these features made it the de facto standard for distributed systems within the Apache community.

Despite its prolific adoption, there is an emerging trend toward eliminating the dependency on Zookeeper altogether and replacing it with an alternative technology. The most notable example is the KRaft subproject within the Apache Kafka community,

While the KRaft project achieved its goal of making Kafka more self-contained by eliminating the need for an external Zookeeper ensemble, the benefits of the KRaft project are limited to the Kafka community.

This talk introduces Oxia, a subproject within the Apache Pulsar community aimed at providing a horizontally scalable alternative to the traditional Zookeer-based consensus architecture. The goal of Oxia is two-fold:

Develop a consensus and coordination system that doesn’t suffer from Zookeeper’s horizontal scalability limitations.

Create a compelling Zookeeper replacement that can be used across the entire Apache ecosystem and just Apache Pulsar.

This talk will discuss Apache Zookeeper’s inherent scalability issues and demonstrate how Oxia’s architecture is designed to eliminate them entirely. We will also highlight how Oxia’s Java client library makes it easy for projects across the Apache ecosystem to utilize Oxia as a Zookeeper replacement.

Everything you wanted to know about Streaming Lakehouses but were afraid to ask.

Lakehouse represents a transformative approach to data management, merging the best attributes of data lakes and traditional data warehouses. It combines data lake scalability and cost-effectiveness with data warehouse reliability, structure, and performance.

In this talk, we will guide you through the process of building a data ingestion and transformation pipeline that allows you to stream data from the edge all the way to your streaming lakehouse using an entirely open-source technology stack. We will show you how easy it is to offload your streaming data to tiered storage in a Lakehouse native format, such as Delta Lake, Apache Hudi, and Apache Iceberg.

We will conclude by demonstrating how easy it is to query your lakehouse formatted data stream using query engines like Flink or Spark. Allowing you to analyze streaming data quickly and cost-effectively.

An Introduction to Streaming Lakehouse Storage with Apache Pulsar

In the realm of modern data architectures, the notion of a "streaming lakehouse" has emerged as a comprehensive solution for managing both batch and streaming data within a single, adaptable repository. This session serves as a primer on the concept of streaming lakehouse storage and its integration with Apache Pulsar, a powerful platform for real-time messaging and event streaming.

In this talk, you will discover how Apache Pulsar lays the foundation for building a streaming lakehouse storage solution. This presentation will showcase several key aspects, including the seamless ingestion of data streams into Pulsar topics, efficient storage management utilizing Pulsar's tiered storage, integration with distributed storage systems such as Apache Hudi or Delta Lake, and the facilitation of real-time analytics through frameworks like Apache Flink or Apache Spark Streaming.

Building an Integrated Data Streaming Platform with Apache Pulsar and Apache Flink

Discover a groundbreaking integrated data streaming platform that seamlessly merges Apache Pulsar's limitless storage scalability with Apache Flink's real-time processing prowess. This talk illuminates the synergies between these open-source projects.
The presentation provides insights into real-world use cases that benefit from this integration, underscoring the platform's unique strengths, including unmatched scalability and seamless fault tolerance.
In summary, this integrated data streaming platform combines Apache Pulsar's scalability with Apache Flink's real-time processing capability, offering an open-source alternative to proprietary solutions in the marketplace.

Streaming Real-time Data into LLM Models

In today's data-driven world, the ability to harness real-time data and transform it into actionable insights is a paramount challenge. Large Language Models (LLMs) have emerged as powerful tools for generating prompts and uncovering valuable information from vast datasets. This talk will unveil a novel approach to address this challenge by utilizing Apache Pulsar, a highly scalable event streaming platform, as the conduit for real-time data streaming into LLM models.

Our discussion will delve into the seamless integration of Apache Pulsar, showcasing its capabilities in facilitating the continuous flow of real-time data into LLM models. We will explore practical use cases and share insights into how this integration empowers organizations to stay at the forefront of data-driven decision-making.

Community Over Code EU 2024 Sessionize Event

June 2024 Bratislava, Slovakia

Real-Time Analytics Summit 2024 Sessionize Event

May 2024 San Jose, California, United States

Data Saturday Phoenix 2024 Sessionize Event

March 2024 Phoenix, Arizona, United States

ApacheCon North America

TableView: An introduction to Pulsars Database Table Abstraction

October 2022 New Orleans, Louisiana, United States

David Kjerrumgaard

Committer on the Apache Pulsar Project | Published Author | International Speaker | Big Data Expert

Las Vegas, Nevada, United States

Actions

Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.

Jump to top