Most Active Speaker

Timothy Spann

Timothy Spann

Senior Sales Engineer

Princeton, New Jersey, United States

Actions

Tim Spann is the Senior Sales Engineer - Financial Services working withSnowflake, Cortex AI, Apache Polaris, Streamlit, Cloud Data, Vectors, Data, Generative AI, ML, Hugging Face, Apache Kafka, Apache Flink, Apache NiFi, Apache Iceberg, TensorFlow, big data, the IoT, machine learning, and deep learning. Tim has over a decade of experience with the IoT, big data, distributed computing, streaming technologies, and Java programming. Previously, he was a Principal Developer Advocate at Zilliz/Milvus, Principal Developer Advocate at Cloudera, Developer Advocate at StreamNative, Principal Field Engineer at Cloudera, a Senior Solutions Architect at AirisData and a senior field engineer at Pivotal. He blogs for DZone, where he is the Big Data Zone leader, and runs a popular meetup in Princeton on big data, the IoT, deep learning, streaming, NiFi, the blockchain, and Spark. Tim is a frequent speaker at conferences such as IoT Fusion, Strata, ApacheCon, Data Works Summit Berlin, DataWorks Summit Sydney, and Oracle Code NYC. He holds a BS and MS in computer science.

https://medium.com/@tspann
https://www.youtube.com/@FLaNK-Stack
https://www.datainmotion.dev/p/about-me.html
https://dzone.com/users/297029/bunkertor.html

Awards

  • Most Active Speaker 2024
  • Most Active Speaker 2023
  • Most Active Speaker 2022

Area of Expertise

  • Business & Management
  • Finance & Banking
  • Government, Social Sector & Education
  • Information & Communications Technology
  • Media & Information

Topics

  • apache nifi
  • apache flink
  • apache kafka
  • minifi
  • iot
  • AI
  • IOT and Android Things
  • IoT
  • Industrial IoT
  • IIOT
  • Deep Learning
  • cloud
  • Cloud & Infrastructure
  • Cloud & DevOps
  • Cloud data lake use cases
  • Big Data
  • AWS Databases
  • BI on the data lake
  • All things data
  • Streaming Data Analytics
  • Data Streaming
  • Event Streaming
  • Streaming
  • AWS Cloud Containers Kubernetes Streaming Data Big Data HPC
  • Apache Pulsar
  • AWS Data
  • AI STUFF: Big Data Quantum computing & Machine Learning
  • All things data in Azure AWS GCP and on-premises
  • Analytics and Big Data
  • AWS Data & AI
  • Azure Data
  • Iot Edge
  • #IoT
  • Automotive IoT
  • IoT Edge AI
  • Database
  • Data Science
  • Azure Data Platform
  • Azure SQL Database
  • Databases
  • Data Science & AI
  • Azure Data Factory
  • Data Visualization
  • Azure Data & AI
  • Data Warehousing
  • Data Management
  • Database Administration
  • data engineering
  • Data Platform
  • Azure Data Lake
  • Microsoft Data Platform
  • Power BI Dataflows
  • streaming sql
  • SQL
  • ai
  • Artificial Inteligence
  • Internet of Things (IoT)
  • artificial intelligence risk
  • Machine Learning and Artificial Intelligence
  • PostgreSQL
  • Apache Iceberg
  • Amazon Web Services
  • microservices
  • Microservice Architecture
  • generative ai
  • Java with Generative AI & LLMs
  • ​​​​​​​The Generative AI LLM Revolution (ChatGPT)
  • milvus
  • vector database
  • Vector Database
  • Vector Databases
  • genai
  • generativeai
  • Applied Generative AI
  • GenAI
  • azure genai
  • Generative AI Use Cases
  • Enterprise Patterns in Generative AI
  • Generative Art
  • Generative Coding
  • generative adversarial networks
  • Synthetic Data Generation using Generative AI and Amazon Bedrock
  • Machine Learning/Artificial Intelligence
  • Cloud Native

Building Apache NiFi 2.0 Python Processors

In this talk, I will delve into the world of Apache NiFi 2.0 Python processors, exploring the capabilities they offer and demonstrating how to build custom processors to enhance your data processing pipelines. Attendees will gain a deep understanding of the integration points between NiFi and Python, enabling them to leverage the extensive libraries and frameworks available in the Python ecosystem. – Introduction to Apache NiFi 2.0 – Python Processors Deep Dive – Build your own custom Python Processor – Integrating Python Libraries and Frameworks – Debugging and Troubleshooting
By the end of this talk, participants will have a comprehensive understanding of building and optimizing Apache NiFi 2.0 Python processors, enabling them to integrate Python seamlessly into their data processing workflows. This session is suitable for data engineers, architects, and anyone interested in harnessing the combined power of Apache NiFi and Python for efficient data integration and flow management. One of the main uses is to build prompts and call ChatGPT.

NiFI Man: “We're here – but should we have come?”

The last few years, travel has been tough with diseases, air quality problems, fires, airline delays, wars and other events. The only way to know is to measure the conditions and make that decision. So using ASF projects including NiFi, MiNiFi, Iceberg, Kafka, Arrow, Calcite, Tika and Flink I will do just that.

There are so many streams of data to look at to determine if it's worth the trip from flights, delays, the weather, air quality, local sensors, travel advisories, reviews, social media, local transit and more.

So I looked at everything and determined yes Denver is worth the trip from sunny New Jersey. And I'll show you how to make those decisions too.

https://medium.com/@tspann/harnessing-the-power-of-nifi-building-a-seamless-flow-to-ingest-pm2-5-90246393fcab

https://medium.com/@tspann/building-a-travel-advisory-app-with-apache-nifi-in-k8-969b44c84958

Yes, I love "Travel Man".

Adding Generative AI to Real-Time Streaming Pipelines

In this talk I walk through various use cases where bringing real-time data to LLM solves some interesting problems.

In one case we use Apache NiFi to provide a live chat between a person in Slack and several LLM models all orchestrated via NiFi and Kafka. In another case NiFi ingests live travel data and feeds it to HuggingFace and OLLAMA LLM models for summarization. I also do live chatbot. We also augment LLM prompts and results with live data streams. All with ASF projects. I call this pattern FLaNK AI.

https://github.com/tspannhw/FLaNK-HuggingFace-BLOOM-LLM
https://medium.com/@tspann/mixtral-generative-sparse-mixture-of-experts-in-dataflows-59744f7d28a9
https://medium.com/@tspann/building-an-llm-bot-for-meetups-and-conference-interactivity-c211ea6e3b61

Building Real-time Pipelines with FLaNK: A Case Study with Transit Data

In this session, we will explore the powerful combination of Apache Flink, Apache NiFi, and Apache Kafka for building real-time data processing pipelines. We will present a case study using the FLaNK-MTA project, which leverages these technologies to process and analyze real-time data from the New York City Metropolitan Transportation Authority (MTA). By integrating Flink, NiFi, and Kafka, FLaNK-MTA demonstrates how to efficiently collect, transform, and analyze high-volume data streams, enabling timely insights and decision-making.

Takeaways:

Understanding the integration of Apache Flink, Apache NiFi, and Apache Kafka for real-time data processing
Insights into building scalable and fault-tolerant data processing pipelines
Best practices for data collection, transformation, and analytics with FLaNK-MTA as a reference
Knowledge of use cases and potential business impact of real-time data processing pipelines

Unlocking Financial Data with Real-Time Pipelines

Financial institutions thrive on accurate and timely data to drive critical decision-making processes, risk assessments, and regulatory compliance. However, managing and processing vast amounts of financial data in real-time can be a daunting task. To overcome this challenge, modern data engineering solutions have emerged, combining powerful technologies like Apache Flink, Apache NiFi, Apache Kafka, and Iceberg to create efficient and reliable real-time data pipelines. In this talk, we will explore how this technology stack can unlock the full potential of financial data, enabling organizations to make data-driven decisions swiftly and with confidence.
Introduction: Financial institutions operate in a fast-paced environment where real-time access to accurate and reliable data is crucial. Traditional batch processing falls short when it comes to handling rapidly changing financial markets and responding to customer demands promptly. In this talk, we will delve into the power of real-time data pipelines, utilizing the strengths of Apache Flink, Apache NiFi, Apache Kafka, and Iceberg, to unlock the potential of financial data.
Key Points to be Covered:
Introduction to Real-Time Data Pipelines: a. The limitations of traditional batch processing in the financial domain. b. Understanding the need for real-time data processing.
Apache Flink: Powering Real-Time Stream Processing: a. Overview of Apache Flink and its role in real-time stream processing. b. Use cases for Apache Flink in the financial industry. c. How Flink enables fast, scalable, and fault-tolerant processing of streaming financial data.
Apache Kafka: Building Resilient Event Streaming Platforms: a. Introduction to Apache Kafka and its role as a distributed streaming platform. b. Kafka's capabilities in handling high-throughput, fault-tolerant, and real-time data streaming. c. Integration of Kafka with financial data sources and consumers.
Apache NiFi: Data Ingestion and Flow Management: a. Overview of Apache NiFi and its role in data ingestion and flow management. b. Data integration and transformation capabilities of NiFi for financial data. c. Utilizing NiFi to collect and process financial data from diverse sources.
Iceberg: Efficient Data Lake Management: a. Understanding Iceberg and its role in managing large-scale data lakes. b. Iceberg's schema evolution and table-level metadata capabilities. c. How Iceberg simplifies data lake management in financial institutions.
Real-World Use Cases: a. Real-time fraud detection using Flink, Kafka, and NiFi. b. Portfolio risk analysis with Iceberg and Flink. c. Streamlined regulatory reporting leveraging all four technologies.
Best Practices and Considerations: a. Architectural considerations when building real-time financial data pipelines. b. Ensuring data integrity, security, and compliance in real-time pipelines. c. Scalability and performance optimization techniques.
Conclusion: In this talk, we will demonstrate the power of combining Apache Flink, Apache NiFi, Apache Kafka, and Iceberg to unlock financial data's true potential. Attendees will gain insights into how these technologies can empower financial institutions to make informed decisions, respond to market changes swiftly, and comply with regulations effectively. Join us to explore the world of real-time data pipelines and revolutionize financial data management.

Empowering IoT with Real-time Stream Processing: Flink, NiFi, and Pulsar

The rapid growth of the Internet of Things (IoT) has generated an enormous volume of data that organizations must harness to gain valuable insights and drive actionable outcomes. To address the challenges of processing IoT data at scale, this talk proposal aims to explore the powerful combination of Apache Flink, Apache NiFi, and Apache Pulsar. We will delve into how these cutting-edge technologies can empower IoT applications with real-time stream processing, seamless data integration, and reliable message queuing.

Building a Full Lifecycle Streaming Data Pipeline

In this talk, we will delve into the process of building a full lifecycle streaming data pipeline using Apache Airflow, Apache Kafka, and Apache Iceberg. We will cover the key features and capabilities of each tool, and demonstrate how they can be integrated to create a robust and efficient pipeline for handling real-time streaming data.

By combining the power of Apache Kafka, Apache Airflow, Apache NiFi and Apache Iceberg, developers can build a full lifecycle streaming data pipeline that is capable of efficiently handling real-time data at scale. This talk will provide a comprehensive overview of how to utilize these tools to build a reliable and effective streaming data pipeline.

Building a Real-Time IoT Application with Apache Pulsar and Apache Pinot

We will walk step-by-step with live code and demos on how to build a real-time IoT application with Pinot + Pulsar.

First, we stream sensor data from an edge device monitoring location conditions to Pulsar via a Python application.

We have our Apache Pinot "realtime" table connected to Pulsar via the pinot-pulsar stream ingestion connector.

Our data streams into the stream, and we visualize it with Superset.

https://medium.com/@tspann/building-a-real-time-iot-application-with-apache-pulsar-and-apache-pinot-1e3baf8c1824

Source Code
https://github.com/tspannhw/pulsar-thermal-pinot

Reference
https://docs.pinot.apache.org/basics/data-import/pinot-stream-ingestion/apache-pulsar

https://dev.startree.ai/docs/pinot/recipes/pulsar

Sink Your Teeth into Streaming at Any Scale

Using the low-latency Apache Pulsar we can build up millions of streams of concurrent data and join them in real time with Apache Flink. We need an ultra-low latency database that can support these workloads to build next-generation IoT, financial and instant analytical transit applications

By sinking data into ScyllaDB we enable amazingly fast applications that can grow to any size and join with existing data sources.

The next generation of apps is being built now, you must choose the right low-latency scalable platform for these massively data-intensive applications. ScyllaDB + Pulsar + Flink is that platform. Choose Open, Choose Fast, and Make the right choice.

Building Modern Data Streaming Apps

In my session, I will show you some best practices I have discovered over the last 7 years in building data streaming applications including IoT, CDC, Logs, and more.

In my modern approach, we utilize several open-source frameworks to maximize the best features of all. We often start with Apache NiFi as the orchestrator of streams flowing into Apache Pulsar. From there we build streaming ETL with Apache Spark and enhance events with Pulsar Functions for ML and enrichment. We build continuous queries against our topics with Flink SQL. We will stream data into ScyllaDB.

We use the best streaming tools for the current applications with FLiPN and FLaNK. https://www.flipn.app/

Deploying Machine Learning Models with Pulsar Functions

In this talk I will present a technique for deploying machine learning models to provide real-time predictions using Apache Pulsar Functions. In order to provide a prediction in real-time, the model usually receives a single data point from the caller, and is expected to provide an accurate prediction within a few milliseconds. 

Throughout this talk, I will demonstrate the steps required to deploy a fully-trained ML that predicts the delivery time for a food delivery service based upon real-time traffic information, the customer's location, and the restaurant that will be fulfilling the order.

Architecting Your First Event Driven Serverless Streaming Applications

Once you have built a topic in Apache Pulsar, you will quickly see the need to build event-driven applications. This can require a lot of decisions on what framework to use, where to run it, how to deploy it, and how to manage these applications.

I will walk you through step-by-step in building Pulsar Functions which is the easy way to design, test, develop, integrate, deploy, monitor, and manage serverless streaming applications in Java and Python.

Together we will build a full application as an Apache Pulsar function and enjoy the power of running it in the cloud for IoT events and add any routing, transformation, or machine learning that we need to accomplish our business requirements.

BUILD ML ENHANCED EVENT STREAMING APPLICATIONS WITH JAVA MICROSERVICES

In this talk we will walk through how to build event streaming applications as functions running in with cloud native messaging via Apache Pulsar that run on near infinite scale in any cloud, docker or K8. We will show you have to deploy ML functions to transform real-time data for IoT, Streaming Analytics and many other use cases. After this talk you will be able to build Java microservices with ease and deploy them anywhere utilizing the open source unified streaming and messaging platform, Apache Pulsar. Finally, we will show you have to add dashboards with Web Sockets, no code data sinks, integrate with Apache NiFi data pipelines, SQL Reports with Apache Spark and finally continuous ETL with Apache Flink. I have built many of these applications for many organizations as part of the FLiPN Stack. Let's build next generation applications today regardless if your data is REST APIs, Sensors, Logs, NoSQL Sources, Events or Database tables.

https://github.com/tspannhw?tab=repositories&q=FLiP&type=source

Building FLiPN Stack Edge AI Applications

Introducing the FLiPN stack which combines Apache Flink, Apache NiFi, Apache Pulsar and other Apache tools to build fast applications for IoT, AI, rapid ingest with Java, C#, Python or Golang.

FLiPN provides a quick set of tools to build applications at any scale for any streaming and IoT use cases.

Apache Pulsar enables Java applications to communicate asynchronously at any scale, geo-replicate and interact with non JVM applications. Pulsar also acts as a function mesh to run Java functions as a FaaS triggered by Events. All of this is open source and includes an integrated Schema Registry with support for JSON, Avro, Text and ProtoBuf schemas.

Tools
Java, Golang, Python, C#, Apache Flink, Apache Pulsar, Apache NiFi, MiNiFi, Apache MXNet, DJL.AI

References
https://streamnative.io/blog/engineering/2021-11-17-building-edge-applications-with-apache-pulsar/
https://www.datainmotion.dev/2019/08/rapid-iot-development-with-cloudera.html
https://www.datainmotion.dev/2019/09/powering-edge-ai-for-sensor-reading.html
https://www.datainmotion.dev/2019/05/dataworks-summit-dc-2019-report.html
https://www.datainmotion.dev/2019/03/using-raspberry-pi-3b-with-apache-nifi.html
https://www.datainmotion.dev/2021/11/producing-and-consuming-pulsar-messages.html

Apache Pulsar Development 101 with Python

In this session I will get you started with real-time cloud native streaming programming with Python.

We will start off with a gentle introduction to Apache Pulsar and setting up your first easy standalone cluster. We will then l show you how to produce and consume message to Pulsar using several different Python libraries including Python client, websockets, MQTT and even Kafka.

After this session you will building real-time streaming and messaging applications with Python.

Ingesting Data at Scale into Elasticsearch with Apache Pulsar

One of the best things about Elasticsearch is its ability to handle large amounts of data and serve this data with sub-millisecond latency, which makes it an ideal platform to run analytics workloads. But like any purpose-built database, there are always trade-offs to consider. Elasticsearch's case is how to load the data continuously and at scale. A way to solve this problem is by using a buffer layer that can store and forward events to Elasticsearch. Apache Pulsar provides a great alternative to implement this layer.

This talk will explain how Pulsar can implement data ingestion, validation, aggregation, and storage and push this data to Elasticsearch using the sink connector. It will provide the necessary knowledge for you to ingest any data of data, such as logs, sensor data, and streaming events into Elasticsearch for analytics and visualization.

FLiP Into Apache Pulsar Apps with MongoDB

In this session, I will introduce you to the world of Apache Pulsar and how to build real-time messaging and streaming application with a variety of OSS libraries, schemas, languages, frameworks and tools against MongoDB. We will show you all the options from MQTT, Web Sockets, Java, Golang, Python, NodeJS, Apache NiFi, Kafka on Pulsar, Pulsar protocol and more. You will FLiP your lid on how much you learn in a short time. I will send out instructions on the few steps you need to get an environment ready to start building awesome apps. We'll also show you how to quickly deploy an app to a production cloud cluster with StreamNative.

Utilizing Apache Kafka, Apache NiFi and MiNiFi for EdgeAI IoT at Scale

A hands-on deep dive on using Apache Kafka, Kafka Streams, Apache NiFi + Edge Flow Manager + MiniFi Agents with Apache MXNet, OpenVino, TensorFlow Lite, and other Deep Learning Libraries on the actual edge devices including Raspberry Pi with Movidius 2, Google Coral TPU and NVidia Jetson Nano. We run deep learning models on the edge devices and send images, capture real-time GPS and sensor data. With our low coding IoT applications providing easy edge routing, transformation, data acquisition and alerting before we decide what data to stream real-time to our data space. These edge applications classify images and sensor readings real-time at the edge and then send Deep Learning results to Kafka Streams and Apache NiFi for transformation, parsing, enrichment, querying, filtering and merging data to various Apache data stores including Apache Kudu and Apache HBase.

https://www.datainmotion.dev/2019/08/updating-machine-learning-models-at.html

Using Apache NiFi with Apache Pulsar for Fast Data On-Ramp

As the Pulsar communities grows, more and more connectors will be added. To enhance the availability of sources and sinks and to make use of the greater Apache Streaming community, joining forces between Apache NiFi and Apache Pulsar is a perfect fit. Apache NiFi also adds the benefits of ELT, ETL, data crunching, transformation, validation and batch data processing. Once data is ready to be an event, NiFi can launch it into Pulsar at light speed.

I will walk through how to get started, some use cases and demos and answer questions.

Hail Hydrate! From Stream to Lake with Pulsar and Friends

A cloud data lake that is empty is not useful to anyone.

How can you quickly, scalably and reliably fill your cloud data lake with diverse sources of data you already have and new ones you never imagined you needed. Utilizing open source tools from Apache, the FLiP stack enables any data engineer, programmer or analyst to build reusable modules with low or no code. FLiP utilizes Apache NiFi, Apache Pulsar, Apache Flink and MiNiFi agents to load CDC, Logs, REST, XML, Images, PDFs, Documents, Text, semistructured data, unstructured data, structured data and a hundred data sources you could never dream of streaming before.

I will teach you how to fish in the deep end of the lake and return a data engineering hero. Let's hope everyone is ready to go from 0 to Petabyte hero.

FLiP Stack for Cloud Data Lakes

Utilizing an all Apache stack for Rapid Data Lake Population and querying utilizing Apache Flink, Apache Pulsar and Apache NiFi.

We can quickly stream data to and from any datalake, data lake house, lakehouse, database or any datamart regardless of cloud or size. FLiP allows for Java and Python developers to build scalable solutions that span messaging and streaming in cloud native fashion with full monitoring.

Apache Pulsar with MQTT for Edge Computing

Today we will span from edge to any and all clouds to support data collection, real-time streaming, sensor ingest, edge computing, IoT use cases and edge AI. Apache Pulsar allows us to build computing at the edge and produce and consume messages at scale in any IoT, hybrid or cloud environment. Apache Pulsar supports MoP which allows for MQTT protocol to be used for high speed messaging.

We will teach you to quickly build scalable open source streaming applications regardless of if you are running in containers, pods, edge devices, VMs, on-premise servers, moving vehicles and any cloud.

Continuous SQL with Kafka and Flink

In this talk, I will walk through how someone can setup and run continous SQL queries against Kafka topics utilizing Apache Flink. We will walk through creating Kafka topics, schemas and publishing data.

We will then cover consuming Kafka data, joining Kafka topics and inserting new events into Kafka topics as they arrive. This basic over view will show hands-on techniques, tips and examples of how to do this.

Apache NiFi 101: Introduction and Best Practices

https://www.datainmotion.dev/2020/06/no-more-spaghetti-flows.html
https://github.com/tspannhw/EverythingApacheNiFi
https://www.datainmotion.dev/2020/12/basic-understanding-of-cloudera-flow.html
https://www.datainmotion.dev/2020/10/top-25-use-cases-of-cloudera-flow.html

In this talk, we will walk step by step through Apache NiFi from the first load to first application. I will include slides, articles and examples to take away as a Quick Start to utilizing Apache NiFi in your real-time dataflows. I will help you get up and running locally on your laptop, Docker or in CDP Public Cloud.

I will cover:
Terminology
Flow Files
Version Control
Repositories
Basic Record Processing
Provenance
Backpressure
Prioritizers
System Diagnostics
Processors
Process Groups
Scheduling and Cron
Bulletin Board
Relationships
Routing
Tasks
Networking
Basic Cluster Architecture
Listeners
Controller Services
Remote Ports
Handling Errors
Funnels

Real-Time Streaming in Any and All Clouds, Hybrid and Beyond

Description
Today, data is being generated from devices and containers living at the edge of networks, clouds and data centers. We need to run business logic, analytics and deep learning at the scale and as events arrive.

Tools:
Apache Flink, Apache Kafka, Apache NiFi, MiNiFi, DJL.ai Apache MXNet.

References:
https://www.datainmotion.dev/2019/11/introducing-mm-flank-apache-flink-stack.html
https://www.datainmotion.dev/2019/08/rapid-iot-development-with-cloudera.html
https://www.datainmotion.dev/2019/09/powering-edge-ai-for-sensor-reading.html
https://www.datainmotion.dev/2019/05/dataworks-summit-dc-2019-report.html
https://www.datainmotion.dev/2019/03/using-raspberry-pi-3b-with-apache-nifi.html

Source Code: https://github.com/tspannhw/MmFLaNK

Tags
AI + Machine Learning Databases Developer Tools Hybrid Integration Internet of Things

Real-Time Streaming in Azure

Today, data is being generated from devices and containers living at the edge of networks, clouds and data centers. We need to run business logic, analytics and deep learning at the scale and as events arrive.

Tools:
Apache Flink, Apache Kafka, Apache NiFi, MiNiFi, DJL.ai Apache MXNet.

References:
https://www.datainmotion.dev/2019/11/introducing-mm-flank-apache-flink-stack.html
https://www.datainmotion.dev/2019/08/rapid-iot-development-with-cloudera.html
https://www.datainmotion.dev/2019/09/powering-edge-ai-for-sensor-reading.html
https://www.datainmotion.dev/2019/05/dataworks-summit-dc-2019-report.html
https://www.datainmotion.dev/2019/03/using-raspberry-pi-3b-with-apache-nifi.html

Source Code: https://github.com/tspannhw/MmFLaNK

Pack Your Bags, We’re Going on a Data Journey!

This three-hour workshop is aimed at organizations who have (or are about to) embark(ed) on their data journey, and are looking for guidance on best practices, tools, and recommendations on navigating through the full data science lifecycle from collection to visualization.

Participants will be exposed to a variety of speakers and data experts to illuminate the critical elements that go into making their data journey a success. The session will kick off with a keynote speaker that will provide an overview of the data journey, followed by a hands-on demonstration highlighting the various personas needed in a data team participating in this journey. The demo will also showcase some of the open-source tools used by experts in the field, while using datasets and use cases relevant to nonprofits. Finally, participants will rotate between breakout sessions to further explore each of these tools and personas, and to give them an opportunity to speak with data specialists who can help address their specific data questions and challenges.

Participants will leave this interactive workshop armed with a stronger understanding and a roadmap to embark on their data journey successfully. We will also be incorporating best practices and learnings from our successful workshop at NetHope 2019.

Cracking the Nut, Solving Edge AI with Apache Tools and Frameworks

Today, data is being generated from devices and containers living at the edge of networks, clouds and data centers. We need to run business logic, analytics and deep learning at the edge before we start our real-time streaming flows. Fortunately using the all Apache Mm FLaNK stack we can do this with ease! Streaming AI Powered Analytics From the Edge to the Data Center is now a simple use case. With MiNiFi we can ingest the data, do data checks, cleansing, run machine learning and deep learning models and route our data in real-time to Apache NiFi and/or Apache Kafka for further transformations and processing. Apache Flink will provide our advanced streaming capabilities fed real-time via Apache Kafka topics. Apache MXNet models will run both at the edge and in our data centers via Apache NiFi and MiNiFi. Our final data will be stored in Apache Kudu via Apache NiFi for final SQL analytics.

Tools:
Apache Flink, Apache Kafka, Apache NiFi, MiNiFi, DJL.ai Apache MXNet, Apache Kudu, Apache Impala, Apache HDFS

References:
https://www.datainmotion.dev/2019/11/introducing-mm-flank-apache-flink-stack.html
https://www.datainmotion.dev/2019/08/rapid-iot-development-with-cloudera.html
https://www.datainmotion.dev/2019/09/powering-edge-ai-for-sensor-reading.html
https://www.datainmotion.dev/2019/05/dataworks-summit-dc-2019-report.html
https://www.datainmotion.dev/2019/03/using-raspberry-pi-3b-with-apache-nifi.html

Source Code: https://github.com/tspannhw/MmFLaNK

Using the Mm FLaNK Stack for Edge AI (Flink, NiFi, Kafka, Kudu)

Introducing the FLaNK stack which combines Apache Flink, Apache NiFi, Apache Kafka and Apache Kudu to build fast applications for IoT, AI, rapid ingest.

FLaNK provides a quick set of tools to build applications at any scale for any streaming and IoT use cases.

https://www.flankstack.dev/

Tools
Apache Flink, Apache Kafka, Apache NiFi, MiNiFi, Apache MXNet, Apache Kudu, Apache Impala, Apache HDFS

References
https://www.datainmotion.dev/2019/08/rapid-iot-development-with-cloudera.html
https://www.datainmotion.dev/2019/09/powering-edge-ai-for-sensor-reading.html
https://www.datainmotion.dev/2019/05/dataworks-summit-dc-2019-report.html
https://www.datainmotion.dev/2019/03/using-raspberry-pi-3b-with-apache-nifi.html

CloudX 2024 Sessionize Event

November 2024 Santa Clara, California, United States

Budapest Data+ML Forum 2024 Sessionize Event

June 2024 Budapest, Hungary

Open Source Analytics Conference 2023 Sessionize Event

December 2023

JCON WORLD 2023 Sessionize Event

November 2023

Pulsar Summit North America 2023 Sessionize Event

October 2023 San Francisco, California, United States

AI DevWorld 2023 Sessionize Event

October 2023 Santa Clara, California, United States

WeAreDevelopers World Congress 2023 Sessionize Event

July 2023 Berlin, Germany

Big Data Fest by SoftServe Sessionize Event

May 2023

Pulsar Virtual Summit Europe 2023 Sessionize Event

May 2023

Real-Time Analytics Summit 2023 Sessionize Event

April 2023 San Francisco, California, United States

Devnexus 2023 Sessionize Event

April 2023 Atlanta, Georgia, United States

ScyllaDB Summit 2023 Sessionize Event

February 2023

Pulsar Summit Asia 2022 Sessionize Event

November 2022

2022 All Day DevOps Sessionize Event

November 2022

AI DevWorld 2022 Sessionize Event

October 2022 San Jose, California, United States

Data on Kubernetes Day @ Kubecon / CloudNativeCon NA 2022 Sessionize Event

October 2022 Detroit, Michigan, United States

Current 2022: The Next Generation of Kafka Summit Sessionize Event

October 2022 Austin, Texas, United States

JConf.dev 2022 Sessionize Event

September 2022 Chicago, Illinois, United States

Cloud Lunch and Learn Sessionize Event

July 2022

SQLBits 2022 Sessionize Event

March 2022 London, United Kingdom

Elastic Community Conference 2022 Sessionize Event

February 2022

Scylla Summit 2022 Sessionize Event

February 2022

DeveloperWeek 2022 Sessionize Event

February 2022 Oakland, California, United States

GDG DevFest UK & Ireland Sessionize Event

January 2022 London, United Kingdom

DataMinutes #2 Sessionize Event

January 2022

Pulsar Summit Asia 2021 Sessionize Event

January 2022

Porto Tech Hub Conference 2021 Sessionize Event

November 2021

Automation + DevOps Summit Sessionize Event

November 2021 Nashville, Tennessee, United States

PASS Data Community Summit 2021 Sessionize Event

November 2021

API World 2021 Sessionize Event

October 2021

InfluxDays North America Virtual Experience 2021 Sessionize Event

October 2021

AI DevWorld 2021 Sessionize Event

October 2021

Big Mountain Data and Dev Conference Sessionize Event

October 2021

Northern VA CodeCamp Fall 2021 Sessionize Event

October 2021

DBCC International 2021 Sessionize Event

October 2021

Scenic City Summit 2021 Sessionize Event

September 2021

Apache Con Global

September 2021 New Orleans, Louisiana, United States

Music City Tech 2021 Sessionize Event

September 2021

WorldFestival 2021 Sessionize Event

August 2021

Apache Con Asia

FLaNK

August 2021 Tokyo, Japan

AI and IoT Bulgaria Summit 2021 Sessionize Event

June 2021 Sofia, Bulgaria

DeveloperWeek Europe 2021 Sessionize Event

April 2021

AI DevWorld 2020 Sessionize Event

October 2020 San Jose, California, United States

NetHope Global Summit 2020 Sessionize Event

October 2020 New York City, New York, United States

Flink Forward Global Virtual 2020 Sessionize Event

October 2020

Timothy Spann

Senior Sales Engineer

Princeton, New Jersey, United States

Actions

Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.

Jump to top