Session

Building a real-time analytics application with Apache Pulsar and Apache Pinot

Apache Pulsar is a distributed, open source pub-sub messaging and streaming platform for real-time workloads, managing hundreds of billions of events per day. It is being run in production, processing millions of messages per second across millions of topics. It has been adopted by companies such as yahoo!, Verizon Media, Splunk, and more.

In this talk we'll learn how analytical queries can be run on top of Puslar's event data with Apache Pinot, a real-time distributed OLAP datastore, which is used to deliver scalable real-time analytics with low latency.

We'll explore the integration between Pulsar and Pinot, explaining the features that it supports and the challenges faced while building it.

After that we'll demonstrate how to build a real-time analytics dashboard with these technologies. We’ll stream data into Pulsar using its Python client, ingest that data into a Pinot real-time table, and write some basic queries using Pinot’s Python SDK. Once we've done that, we’ll bring everything together with an auto refreshing dashboard using Plot.ly Dash, so that we can see changes to the data as they happen.

Mary Grygleski

AI Practice Lead, TED/x Speaker, Technical Advocate, Java Champion, President of Chicago-JUG, Chapter Co-Lead of AICamp-Chicago

Chicago, Illinois, United States

Actions

Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.

Jump to top