Timothy Spann

Information & Communications Technology

Media & Information

apache nifi apache flink apache kafka minifi iot AI IOT and Android Things IoT Industrial IoT IIOT Deep Learning cloud Cloud & Infrastructure Cloud & DevOps Cloud data lake use cases Big Data AWS Databases BI on the data lake All things data Streaming Data Analytics Data Streaming Event Streaming Streaming AWS Cloud Containers Kubernetes Streaming Data Big Data HPC Apache Pulsar AWS Data AI STUFF: Big Data Quantum computing & Machine Learning All things data in Azure AWS GCP and on-premises Analytics and Big Data AWS Data & AI Azure Data Iot Edge #IoT Automotive IoT IoT Edge AI Database Data Science Azure Data Platform Azure SQL Database Databases Data Science & AI Azure Data Factory Data Visualization Azure Data & AI Data Warehousing Data Management Database Administration data engineering Data Platform Azure Data Lake Microsoft Data Platform Power BI Dataflows

Princeton, New Jersey, United States

Timothy Spann

Developer Advocate @ StreamNative

Tim Spann is a Developer Advocate @ StreamNative where he works with Apache Pulsar, Apache Flink, Apache NiFi, Apache MXNet, TensorFlow, Apache Spark, big data, the IoT, machine learning, and deep learning. Tim has over a decade of experience with the IoT, big data, distributed computing, streaming technologies, and Java programming. Previously, he was a Principal Field Engineer at Cloudera, a Senior Solutions Architect at AirisData and a senior field engineer at Pivotal. He blogs for DZone, where he is the Big Data Zone leader, and runs a popular meetup in Princeton on big data, the IoT, deep learning, streaming, NiFi, the blockchain, and Spark. Tim is a frequent speaker at conferences such as IoT Fusion, Strata, ApacheCon, Data Works Summit Berlin, DataWorks Summit Sydney, and Oracle Code NYC. He holds a BS and MS in computer science.

https://www.datainmotion.dev/p/about-me.html
https://dzone.com/users/297029/bunkertor.html
https://conferences.oreilly.com/strata/strata-ny-2018/public/schedule/speaker/185963

  datainmotion.dev (blog)
  streamnative.io (company)

Current sessions

BUILD ML ENHANCED EVENT STREAMING APPLICATIONS WITH JAVA MICROSERVICES

In this talk we will walk through how to build event streaming applications as functions running in with cloud native messaging via Apache Pulsar that run on near infinite scale in any cloud, docker or K8. We will show you have to deploy ML functions to transform real-time data for IoT, Streaming Analytics and many other use cases. After this talk you will be able to build Java microservices with ease and deploy them anywhere utilizing the open source unified streaming and messaging platform, Apache Pulsar. Finally, we will show you have to add dashboards with Web Sockets, no code data sinks, integrate with Apache NiFi data pipelines, SQL Reports with Apache Spark and finally continuous ETL with Apache Flink. I have built many of these applications for many organizations as part of the FLiPN Stack. Let's build next generation applications today regardless if your data is REST APIs, Sensors, Logs, NoSQL Sources, Events or Database tables.

https://github.com/tspannhw?tab=repositories&q=FLiP&type=source


Building FLiPN Stack Edge AI Applications

Introducing the FLiPN stack which combines Apache Flink, Apache NiFi, Apache Pulsar and other Apache tools to build fast applications for IoT, AI, rapid ingest with Java, C#, Python or Golang.

FLiPN provides a quick set of tools to build applications at any scale for any streaming and IoT use cases.

Apache Pulsar enables Java applications to communicate asynchronously at any scale, geo-replicate and interact with non JVM applications. Pulsar also acts as a function mesh to run Java functions as a FaaS triggered by Events. All of this is open source and includes an integrated Schema Registry with support for JSON, Avro, Text and ProtoBuf schemas.

Tools
Java, Golang, Python, C#, Apache Flink, Apache Pulsar, Apache NiFi, MiNiFi, Apache MXNet, DJL.AI

References
https://streamnative.io/blog/engineering/2021-11-17-building-edge-applications-with-apache-pulsar/
https://www.datainmotion.dev/2019/08/rapid-iot-development-with-cloudera.html
https://www.datainmotion.dev/2019/09/powering-edge-ai-for-sensor-reading.html
https://www.datainmotion.dev/2019/05/dataworks-summit-dc-2019-report.html
https://www.datainmotion.dev/2019/03/using-raspberry-pi-3b-with-apache-nifi.html
https://www.datainmotion.dev/2021/11/producing-and-consuming-pulsar-messages.html


Apache Pulsar Development 101 with Python

In this session I will get you started with real-time cloud native streaming programming with Python.

We will start off with a gentle introduction to Apache Pulsar and setting up your first easy standalone cluster. We will then l show you how to produce and consume message to Pulsar using several different Python libraries including Python client, websockets, MQTT and even Kafka.

After this session you will building real-time streaming and messaging applications with Python.


Ingesting Data at Scale into Elasticsearch with Apache Pulsar

One of the best things about Elasticsearch is its ability to handle large amounts of data and serve this data with sub-millisecond latency, which makes it an ideal platform to run analytics workloads. But like any purpose-built database, there are always trade-offs to consider. Elasticsearch's case is how to load the data continuously and at scale. A way to solve this problem is by using a buffer layer that can store and forward events to Elasticsearch. Apache Pulsar provides a great alternative to implement this layer.

This talk will explain how Pulsar can implement data ingestion, validation, aggregation, and storage and push this data to Elasticsearch using the sink connector. It will provide the necessary knowledge for you to ingest any data of data, such as logs, sensor data, and streaming events into Elasticsearch for analytics and visualization.


FLiP Into Apache Pulsar Apps with MongoDB

In this session, I will introduce you to the world of Apache Pulsar and how to build real-time messaging and streaming application with a variety of OSS libraries, schemas, languages, frameworks and tools against MongoDB. We will show you all the options from MQTT, Web Sockets, Java, Golang, Python, NodeJS, Apache NiFi, Kafka on Pulsar, Pulsar protocol and more. You will FLiP your lid on how much you learn in a short time. I will send out instructions on the few steps you need to get an environment ready to start building awesome apps. We'll also show you how to quickly deploy an app to a production cloud cluster with StreamNative.


Utilizing Apache Kafka, Apache NiFi and MiNiFi for EdgeAI IoT at Scale

A hands-on deep dive on using Apache Kafka, Kafka Streams, Apache NiFi + Edge Flow Manager + MiniFi Agents with Apache MXNet, OpenVino, TensorFlow Lite, and other Deep Learning Libraries on the actual edge devices including Raspberry Pi with Movidius 2, Google Coral TPU and NVidia Jetson Nano. We run deep learning models on the edge devices and send images, capture real-time GPS and sensor data. With our low coding IoT applications providing easy edge routing, transformation, data acquisition and alerting before we decide what data to stream real-time to our data space. These edge applications classify images and sensor readings real-time at the edge and then send Deep Learning results to Kafka Streams and Apache NiFi for transformation, parsing, enrichment, querying, filtering and merging data to various Apache data stores including Apache Kudu and Apache HBase.

https://www.datainmotion.dev/2019/08/updating-machine-learning-models-at.html


Using Apache NiFi with Apache Pulsar for Fast Data On-Ramp

As the Pulsar communities grows, more and more connectors will be added. To enhance the availability of sources and sinks and to make use of the greater Apache Streaming community, joining forces between Apache NiFi and Apache Pulsar is a perfect fit. Apache NiFi also adds the benefits of ELT, ETL, data crunching, transformation, validation and batch data processing. Once data is ready to be an event, NiFi can launch it into Pulsar at light speed.

I will walk through how to get started, some use cases and demos and answer questions.


Hail Hydrate! From Stream to Lake with Pulsar and Friends

A cloud data lake that is empty is not useful to anyone.

How can you quickly, scalably and reliably fill your cloud data lake with diverse sources of data you already have and new ones you never imagined you needed. Utilizing open source tools from Apache, the FLiP stack enables any data engineer, programmer or analyst to build reusable modules with low or no code. FLiP utilizes Apache NiFi, Apache Pulsar, Apache Flink and MiNiFi agents to load CDC, Logs, REST, XML, Images, PDFs, Documents, Text, semistructured data, unstructured data, structured data and a hundred data sources you could never dream of streaming before.

I will teach you how to fish in the deep end of the lake and return a data engineering hero. Let's hope everyone is ready to go from 0 to Petabyte hero.


FLiP Stack for Cloud Data Lakes

Utilizing an all Apache stack for Rapid Data Lake Population and querying utilizing Apache Flink, Apache Pulsar and Apache NiFi.

We can quickly stream data to and from any datalake, data lake house, lakehouse, database or any datamart regardless of cloud or size. FLiP allows for Java and Python developers to build scalable solutions that span messaging and streaming in cloud native fashion with full monitoring.


Apache Pulsar with MQTT for Edge Computing

Today we will span from edge to any and all clouds to support data collection, real-time streaming, sensor ingest, edge computing, IoT use cases and edge AI. Apache Pulsar allows us to build computing at the edge and produce and consume messages at scale in any IoT, hybrid or cloud environment. Apache Pulsar supports MoP which allows for MQTT protocol to be used for high speed messaging.

We will teach you to quickly build scalable open source streaming applications regardless of if you are running in containers, pods, edge devices, VMs, on-premise servers, moving vehicles and any cloud.


Continuous SQL with Kafka and Flink

In this talk, I will walk through how someone can setup and run continous SQL queries against Kafka topics utilizing Apache Flink. We will walk through creating Kafka topics, schemas and publishing data.

We will then cover consuming Kafka data, joining Kafka topics and inserting new events into Kafka topics as they arrive. This basic over view will show hands-on techniques, tips and examples of how to do this.


Apache NiFi 101: Introduction and Best Practices

https://www.datainmotion.dev/2020/06/no-more-spaghetti-flows.html
https://github.com/tspannhw/EverythingApacheNiFi
https://www.datainmotion.dev/2020/12/basic-understanding-of-cloudera-flow.html
https://www.datainmotion.dev/2020/10/top-25-use-cases-of-cloudera-flow.html

In this talk, we will walk step by step through Apache NiFi from the first load to first application. I will include slides, articles and examples to take away as a Quick Start to utilizing Apache NiFi in your real-time dataflows. I will help you get up and running locally on your laptop, Docker or in CDP Public Cloud.

I will cover:
Terminology
Flow Files
Version Control
Repositories
Basic Record Processing
Provenance
Backpressure
Prioritizers
System Diagnostics
Processors
Process Groups
Scheduling and Cron
Bulletin Board
Relationships
Routing
Tasks
Networking
Basic Cluster Architecture
Listeners
Controller Services
Remote Ports
Handling Errors
Funnels


Real-Time Streaming in Any and All Clouds, Hybrid and Beyond

Description
Today, data is being generated from devices and containers living at the edge of networks, clouds and data centers. We need to run business logic, analytics and deep learning at the scale and as events arrive.

Tools:
Apache Flink, Apache Kafka, Apache NiFi, MiNiFi, DJL.ai Apache MXNet.

References:
https://www.datainmotion.dev/2019/11/introducing-mm-flank-apache-flink-stack.html
https://www.datainmotion.dev/2019/08/rapid-iot-development-with-cloudera.html
https://www.datainmotion.dev/2019/09/powering-edge-ai-for-sensor-reading.html
https://www.datainmotion.dev/2019/05/dataworks-summit-dc-2019-report.html
https://www.datainmotion.dev/2019/03/using-raspberry-pi-3b-with-apache-nifi.html

Source Code: https://github.com/tspannhw/MmFLaNK

Tags
AI + Machine Learning Databases Developer Tools Hybrid Integration Internet of Things


Real-Time Streaming in Azure

Today, data is being generated from devices and containers living at the edge of networks, clouds and data centers. We need to run business logic, analytics and deep learning at the scale and as events arrive.

Tools:
Apache Flink, Apache Kafka, Apache NiFi, MiNiFi, DJL.ai Apache MXNet.

References:
https://www.datainmotion.dev/2019/11/introducing-mm-flank-apache-flink-stack.html
https://www.datainmotion.dev/2019/08/rapid-iot-development-with-cloudera.html
https://www.datainmotion.dev/2019/09/powering-edge-ai-for-sensor-reading.html
https://www.datainmotion.dev/2019/05/dataworks-summit-dc-2019-report.html
https://www.datainmotion.dev/2019/03/using-raspberry-pi-3b-with-apache-nifi.html

Source Code: https://github.com/tspannhw/MmFLaNK


Pack Your Bags, We’re Going on a Data Journey!

This three-hour workshop is aimed at organizations who have (or are about to) embark(ed) on their data journey, and are looking for guidance on best practices, tools, and recommendations on navigating through the full data science lifecycle from collection to visualization.

Participants will be exposed to a variety of speakers and data experts to illuminate the critical elements that go into making their data journey a success. The session will kick off with a keynote speaker that will provide an overview of the data journey, followed by a hands-on demonstration highlighting the various personas needed in a data team participating in this journey. The demo will also showcase some of the open-source tools used by experts in the field, while using datasets and use cases relevant to nonprofits. Finally, participants will rotate between breakout sessions to further explore each of these tools and personas, and to give them an opportunity to speak with data specialists who can help address their specific data questions and challenges.

Participants will leave this interactive workshop armed with a stronger understanding and a roadmap to embark on their data journey successfully. We will also be incorporating best practices and learnings from our successful workshop at NetHope 2019.


Cracking the Nut, Solving Edge AI with Apache Tools and Frameworks

Today, data is being generated from devices and containers living at the edge of networks, clouds and data centers. We need to run business logic, analytics and deep learning at the edge before we start our real-time streaming flows. Fortunately using the all Apache Mm FLaNK stack we can do this with ease! Streaming AI Powered Analytics From the Edge to the Data Center is now a simple use case. With MiNiFi we can ingest the data, do data checks, cleansing, run machine learning and deep learning models and route our data in real-time to Apache NiFi and/or Apache Kafka for further transformations and processing. Apache Flink will provide our advanced streaming capabilities fed real-time via Apache Kafka topics. Apache MXNet models will run both at the edge and in our data centers via Apache NiFi and MiNiFi. Our final data will be stored in Apache Kudu via Apache NiFi for final SQL analytics.

Tools:
Apache Flink, Apache Kafka, Apache NiFi, MiNiFi, DJL.ai Apache MXNet, Apache Kudu, Apache Impala, Apache HDFS

References:
https://www.datainmotion.dev/2019/11/introducing-mm-flank-apache-flink-stack.html
https://www.datainmotion.dev/2019/08/rapid-iot-development-with-cloudera.html
https://www.datainmotion.dev/2019/09/powering-edge-ai-for-sensor-reading.html
https://www.datainmotion.dev/2019/05/dataworks-summit-dc-2019-report.html
https://www.datainmotion.dev/2019/03/using-raspberry-pi-3b-with-apache-nifi.html

Source Code: https://github.com/tspannhw/MmFLaNK


Using the Mm FLaNK Stack for Edge AI (Flink, NiFi, Kafka, Kudu)

Introducing the FLaNK stack which combines Apache Flink, Apache NiFi, Apache Kafka and Apache Kudu to build fast applications for IoT, AI, rapid ingest.

FLaNK provides a quick set of tools to build applications at any scale for any streaming and IoT use cases.

https://www.flankstack.dev/

Tools
Apache Flink, Apache Kafka, Apache NiFi, MiNiFi, Apache MXNet, Apache Kudu, Apache Impala, Apache HDFS

References
https://www.datainmotion.dev/2019/08/rapid-iot-development-with-cloudera.html
https://www.datainmotion.dev/2019/09/powering-edge-ai-for-sensor-reading.html
https://www.datainmotion.dev/2019/05/dataworks-summit-dc-2019-report.html
https://www.datainmotion.dev/2019/03/using-raspberry-pi-3b-with-apache-nifi.html


Past and future events

AI DevWorld 2022

25 Oct 2022 - 3 Nov 2022
San Jose, California, United States

Current 2022: The Next Generation of Kafka Summit

4 Oct 2022 - 5 Oct 2022
Austin, Texas, United States

SQLBits

8 Mar 2022 - 12 Mar 2022
London, England, United Kingdom

Elastic Community Conference 2022

11 Feb 2022 - 12 Feb 2022

Scylla Summit 2022

9 Feb 2022 - 10 Feb 2022

DeveloperWeek 2022

2 Feb 2022 - 9 Feb 2022
Oakland, California, United States

GDG DevFest UK & Ireland

29 Jan 2022
London, England, United Kingdom

DataMinutes #2

21 Jan 2022

Pulsar Summit Asia 2021

15 Jan 2022 - 16 Jan 2022

Cloud Lunch and Learn

1 Jan 2021 - 31 Dec 2021

Porto Tech Hub Conference 2021

18 Nov 2021 - 19 Nov 2021

Automation + DevOps Summit

15 Nov 2021 - 17 Nov 2021
Nashville, Tennessee, United States

PASS Data Community Summit 2021

8 Nov 2021 - 12 Nov 2021

API World 2021

26 Oct 2021 - 28 Oct 2021

AI DevWorld 2021

26 Oct 2021 - 28 Oct 2021

InfluxDays North America Virtual Experience 2021

26 Oct 2021 - 27 Oct 2021

Big Mountain Data and Dev Conference

22 Oct 2021 - 23 Oct 2021

DBCC International 2021

15 Oct 2021

Scenic City Summit 2021

24 Sep 2021

Apache Con Global

21 Sep 2021 - 23 Sep 2021
New Orleans, Louisiana, United States

Music City Tech 2021

15 Sep 2021 - 17 Sep 2021

WorldFestival 2021

17 Aug 2021 - 19 Aug 2021

Apache Con Asia

FLaNK
6 Aug 2021 - 8 Aug 2021
Tokyo, Japan

AI and IoT Bulgaria Summit 2021

26 Jun 2021
Sofia, Sofia-Capital, Bulgaria

DeveloperWeek Europe 2021

27 Apr 2021 - 28 Apr 2021

NetHope Global Summit 2020

26 Oct 2020 - 30 Oct 2020
New York City, New York, United States

AI DevWorld 2020

27 Oct 2020 - 29 Oct 2020
San Jose, California, United States

Flink Forward Global Virtual 2020

19 Oct 2020 - 22 Oct 2020