Timothy Spann

Senior Solutions Engineer

Princeton, New Jersey, United States

Actions

Tim Spann is the Senior SolutionsEngineer - Financial Services working withSnowflake, Cortex AI, Apache NiFi, Apache Iceberg, Python, Apache Polaris, Streamlit, Cloud Data, Vectors, Data, Generative AI, ML, Hugging Face, Apache Kafka, Apache Flink, Pytorch, big data, the IoT, machine learning, and deep learning. Tim has over a decade of experience with the IoT, big data, distributed computing, streaming technologies, and Java programming. Previously, he was a Principal Developer Advocate at Zilliz/Milvus, Principal Developer Advocate at Cloudera, Developer Advocate at StreamNative, Principal Field Engineer at Cloudera, a Senior Solutions Architect at AirisData and a senior field engineer at Pivotal. He blogs for DZone, where he is the Big Data Zone leader, and runs a popular meetup in Princeton on big data, the IoT, deep learning, streaming, NiFi, the blockchain, and Spark. Tim is a frequent speaker at conferences such as IoT Fusion, Strata, ApacheCon, Data Works Summit Berlin, DataWorks Summit Sydney, and Oracle Code NYC. He holds a BS and MS in computer science.

https://medium.com/@tspann
https://www.youtube.com/@FLaNK-Stack
https://www.datainmotion.dev/p/about-me.html
https://dzone.com/users/297029/bunkertor.html

Badges

Area of Expertise

Business & Management
Finance & Banking
Government, Social Sector & Education
Information & Communications Technology
Media & Information

Topics

apache nifi
apache flink
apache kafka
minifi
iot
AI
IOT and Android Things
IoT
Industrial IoT
IIOT
Deep Learning
cloud
Cloud & Infrastructure
Cloud & DevOps
Cloud data lake use cases
Big Data
AWS Databases
BI on the data lake
All things data
Streaming Data Analytics
Data Streaming
Event Streaming
Streaming
AWS Cloud Containers Kubernetes Streaming Data Big Data HPC
Apache Pulsar
AWS Data
AI STUFF: Big Data Quantum computing & Machine Learning
All things data in Azure AWS GCP and on-premises
Analytics and Big Data
AWS Data & AI
Azure Data
Iot Edge
#IoT
Automotive IoT
IoT Edge AI
Database
Data Science
Azure Data Platform
Azure SQL Database
Databases
Data Science & AI
Azure Data Factory
Data Visualization
Azure Data & AI
Data Warehousing
Data Management
Database Administration
data engineering
Data Platform
Azure Data Lake
Microsoft Data Platform
Power BI Dataflows
streaming sql
SQL
ai
Artificial Inteligence
Internet of Things (IoT)
artificial intelligence risk
Machine Learning and Artificial Intelligence
PostgreSQL
Apache Iceberg
Amazon Web Services
microservices
Microservice Architecture
generative ai
Java with Generative AI & LLMs
The Generative AI LLM Revolution (ChatGPT)
milvus
vector database
Vector Database
Vector Databases
genai
generativeai
Applied Generative AI
GenAI
azure genai
Generative AI Use Cases
Enterprise Patterns in Generative AI
Generative Art
Generative Coding
generative adversarial networks
Synthetic Data Generation using Generative AI and Amazon Bedrock
Machine Learning/Artificial Intelligence
Cloud Native

Utilizing Real-Time Transit Data for Travel Optimization

There are a lot of factors involved in determining how you can find our way around and avoid delays, bad weather, dangers and expenses. In this talk I will focus on public transport in the largest transit system in the United States, the MTA, which is focused around New York City. Utilizing public and semi-public data feeds, this can be extended to most city and metropolitan areas around the world. As a personal example, I live in New Jersey and this is an extremely useful use of open source and public data.

Once I am notified that I need to travel to Manhattan, I need to start my data streams flowing. Most of the data sources are REST feeds that are ingested by Apache NiFi to transform, convert, enrich and finalize it for usage in Parquet files stored as Apache Iceberg tables.

Unleashing the Potential of Cloud-Native Vector Databases

In this talk we will do a presentation on why you should add a Cloud Native vector database to your Data and AI platform. Milvus let's you scale out and improve your AI use cases through RAG, Real-Time Search, Multimodal Search, Recommendations Engines, fraud detection and many more emerging use cases.

I will you show you can quickly get started and how easy it is to deploy in your own environment.

Introduction to Milvus

In this talk I will do an introduction to Milvus and then describe the newest version running on AWS. I will also demonstrate a number of use cases of how to supercharge Generative AI with Milvus.

From Air Quality to Aircraft & Automobiles, Unstructured Data Is Everywhere

We explore how Apache NiFi can be used to integrate open source LLMs to implement scalable and efficient RAG pipelines. He shows how any kind of data including semistructured, structured and unstructured data from a variety of sources and types can be processed, queried, and used to feed large language models for smart, contextually aware answers. Look for his example utilizing Cortex AI, LLAMA, Apache NiFi, Apache Iceberg, Snowflake, open source tools, libraries, and Notebooks.

Building Apache NiFi 2.0 Python Processors

In this talk, I will delve into the world of Apache NiFi 2.0 Python processors, exploring the capabilities they offer and demonstrating how to build custom processors to enhance your data processing pipelines. Attendees will gain a deep understanding of the integration points between NiFi and Python, enabling them to leverage the extensive libraries and frameworks available in the Python ecosystem. – Introduction to Apache NiFi 2.0 – Python Processors Deep Dive – Build your own custom Python Processor – Integrating Python Libraries and Frameworks – Debugging and Troubleshooting
By the end of this talk, participants will have a comprehensive understanding of building and optimizing Apache NiFi 2.0 Python processors, enabling them to integrate Python seamlessly into their data processing workflows. This session is suitable for data engineers, architects, and anyone interested in harnessing the combined power of Apache NiFi and Python for efficient data integration and flow management. One of the main uses is to build prompts and call ChatGPT.

NiFI Man: “We're here – but should we have come?”

The last few years, travel has been tough with diseases, air quality problems, fires, airline delays, wars and other events. The only way to know is to measure the conditions and make that decision. So using ASF projects including NiFi, MiNiFi, Iceberg, Kafka, Arrow, Calcite, Tika and Flink I will do just that.

There are so many streams of data to look at to determine if it's worth the trip from flights, delays, the weather, air quality, local sensors, travel advisories, reviews, social media, local transit and more.

So I looked at everything and determined yes Denver is worth the trip from sunny New Jersey. And I'll show you how to make those decisions too.

https://medium.com/@tspann/harnessing-the-power-of-nifi-building-a-seamless-flow-to-ingest-pm2-5-90246393fcab

https://medium.com/@tspann/building-a-travel-advisory-app-with-apache-nifi-in-k8-969b44c84958

Yes, I love "Travel Man".

Adding Generative AI to Real-Time Streaming Pipelines

In this talk I walk through various use cases where bringing real-time data to LLM solves some interesting problems.

In one case we use Apache NiFi to provide a live chat between a person in Slack and several LLM models all orchestrated via NiFi and Kafka. In another case NiFi ingests live travel data and feeds it to HuggingFace and OLLAMA LLM models for summarization. I also do live chatbot. We also augment LLM prompts and results with live data streams. All with ASF projects. I call this pattern FLaNK AI.

https://github.com/tspannhw/FLaNK-HuggingFace-BLOOM-LLM
https://medium.com/@tspann/mixtral-generative-sparse-mixture-of-experts-in-dataflows-59744f7d28a9
https://medium.com/@tspann/building-an-llm-bot-for-meetups-and-conference-interactivity-c211ea6e3b61

Building Real-time Pipelines with FLaNK: A Case Study with Transit Data

In this session, we will explore the powerful combination of Apache Flink, Apache NiFi, and Apache Kafka for building real-time data processing pipelines. We will present a case study using the FLaNK-MTA project, which leverages these technologies to process and analyze real-time data from the New York City Metropolitan Transportation Authority (MTA). By integrating Flink, NiFi, and Kafka, FLaNK-MTA demonstrates how to efficiently collect, transform, and analyze high-volume data streams, enabling timely insights and decision-making.

Takeaways:

Understanding the integration of Apache Flink, Apache NiFi, and Apache Kafka for real-time data processing
Insights into building scalable and fault-tolerant data processing pipelines
Best practices for data collection, transformation, and analytics with FLaNK-MTA as a reference
Knowledge of use cases and potential business impact of real-time data processing pipelines

Unlocking Financial Data with Real-Time Pipelines

Financial institutions thrive on accurate and timely data to drive critical decision-making processes, risk assessments, and regulatory compliance. However, managing and processing vast amounts of financial data in real-time can be a daunting task. To overcome this challenge, modern data engineering solutions have emerged, combining powerful technologies like Apache Flink, Apache NiFi, Apache Kafka, and Iceberg to create efficient and reliable real-time data pipelines. In this talk, we will explore how this technology stack can unlock the full potential of financial data, enabling organizations to make data-driven decisions swiftly and with confidence.
Introduction: Financial institutions operate in a fast-paced environment where real-time access to accurate and reliable data is crucial. Traditional batch processing falls short when it comes to handling rapidly changing financial markets and responding to customer demands promptly. In this talk, we will delve into the power of real-time data pipelines, utilizing the strengths of Apache Flink, Apache NiFi, Apache Kafka, and Iceberg, to unlock the potential of financial data.
Key Points to be Covered:
Introduction to Real-Time Data Pipelines: a. The limitations of traditional batch processing in the financial domain. b. Understanding the need for real-time data processing.
Apache Flink: Powering Real-Time Stream Processing: a. Overview of Apache Flink and its role in real-time stream processing. b. Use cases for Apache Flink in the financial industry. c. How Flink enables fast, scalable, and fault-tolerant processing of streaming financial data.
Apache Kafka: Building Resilient Event Streaming Platforms: a. Introduction to Apache Kafka and its role as a distributed streaming platform. b. Kafka's capabilities in handling high-throughput, fault-tolerant, and real-time data streaming. c. Integration of Kafka with financial data sources and consumers.
Apache NiFi: Data Ingestion and Flow Management: a. Overview of Apache NiFi and its role in data ingestion and flow management. b. Data integration and transformation capabilities of NiFi for financial data. c. Utilizing NiFi to collect and process financial data from diverse sources.
Iceberg: Efficient Data Lake Management: a. Understanding Iceberg and its role in managing large-scale data lakes. b. Iceberg's schema evolution and table-level metadata capabilities. c. How Iceberg simplifies data lake management in financial institutions.
Real-World Use Cases: a. Real-time fraud detection using Flink, Kafka, and NiFi. b. Portfolio risk analysis with Iceberg and Flink. c. Streamlined regulatory reporting leveraging all four technologies.
Best Practices and Considerations: a. Architectural considerations when building real-time financial data pipelines. b. Ensuring data integrity, security, and compliance in real-time pipelines. c. Scalability and performance optimization techniques.
Conclusion: In this talk, we will demonstrate the power of combining Apache Flink, Apache NiFi, Apache Kafka, and Iceberg to unlock financial data's true potential. Attendees will gain insights into how these technologies can empower financial institutions to make informed decisions, respond to market changes swiftly, and comply with regulations effectively. Join us to explore the world of real-time data pipelines and revolutionize financial data management.

Empowering IoT with Real-time Stream Processing: Flink, NiFi, and Pulsar

The rapid growth of the Internet of Things (IoT) has generated an enormous volume of data that organizations must harness to gain valuable insights and drive actionable outcomes. To address the challenges of processing IoT data at scale, this talk proposal aims to explore the powerful combination of Apache Flink, Apache NiFi, and Apache Pulsar. We will delve into how these cutting-edge technologies can empower IoT applications with real-time stream processing, seamless data integration, and reliable message queuing.

Building a Full Lifecycle Streaming Data Pipeline

In this talk, we will delve into the process of building a full lifecycle streaming data pipeline using Apache Airflow, Apache Kafka, and Apache Iceberg. We will cover the key features and capabilities of each tool, and demonstrate how they can be integrated to create a robust and efficient pipeline for handling real-time streaming data.

By combining the power of Apache Kafka, Apache Airflow, Apache NiFi and Apache Iceberg, developers can build a full lifecycle streaming data pipeline that is capable of efficiently handling real-time data at scale. This talk will provide a comprehensive overview of how to utilize these tools to build a reliable and effective streaming data pipeline.

Building a Real-Time IoT Application with Apache Pulsar and Apache Pinot

We will walk step-by-step with live code and demos on how to build a real-time IoT application with Pinot + Pulsar.

First, we stream sensor data from an edge device monitoring location conditions to Pulsar via a Python application.

We have our Apache Pinot "realtime" table connected to Pulsar via the pinot-pulsar stream ingestion connector.

Our data streams into the stream, and we visualize it with Superset.

https://medium.com/@tspann/building-a-real-time-iot-application-with-apache-pulsar-and-apache-pinot-1e3baf8c1824

Source Code
https://github.com/tspannhw/pulsar-thermal-pinot

Reference
https://docs.pinot.apache.org/basics/data-import/pinot-stream-ingestion/apache-pulsar

https://dev.startree.ai/docs/pinot/recipes/pulsar

Sink Your Teeth into Streaming at Any Scale

Using the low-latency Apache Pulsar we can build up millions of streams of concurrent data and join them in real time with Apache Flink. We need an ultra-low latency database that can support these workloads to build next-generation IoT, financial and instant analytical transit applications

By sinking data into ScyllaDB we enable amazingly fast applications that can grow to any size and join with existing data sources.

The next generation of apps is being built now, you must choose the right low-latency scalable platform for these massively data-intensive applications. ScyllaDB + Pulsar + Flink is that platform. Choose Open, Choose Fast, and Make the right choice.

Building Modern Data Streaming Apps

In my session, I will show you some best practices I have discovered over the last 7 years in building data streaming applications including IoT, CDC, Logs, and more.

In my modern approach, we utilize several open-source frameworks to maximize the best features of all. We often start with Apache NiFi as the orchestrator of streams flowing into Apache Pulsar. From there we build streaming ETL with Apache Spark and enhance events with Pulsar Functions for ML and enrichment. We build continuous queries against our topics with Flink SQL. We will stream data into ScyllaDB.

We use the best streaming tools for the current applications with FLiPN and FLaNK. https://www.flipn.app/

Deploying Machine Learning Models with Pulsar Functions

In this talk I will present a technique for deploying machine learning models to provide real-time predictions using Apache Pulsar Functions. In order to provide a prediction in real-time, the model usually receives a single data point from the caller, and is expected to provide an accurate prediction within a few milliseconds.  
Throughout this talk, I will demonstrate the steps required to deploy a fully-trained ML that predicts the delivery time for a food delivery service based upon real-time traffic information, the customer's location, and the restaurant that will be fulfilling the order.

Architecting Your First Event Driven Serverless Streaming Applications

Once you have built a topic in Apache Pulsar, you will quickly see the need to build event-driven applications. This can require a lot of decisions on what framework to use, where to run it, how to deploy it, and how to manage these applications.

I will walk you through step-by-step in building Pulsar Functions which is the easy way to design, test, develop, integrate, deploy, monitor, and manage serverless streaming applications in Java and Python.

Together we will build a full application as an Apache Pulsar function and enjoy the power of running it in the cloud for IoT events and add any routing, transformation, or machine learning that we need to accomplish our business requirements.

BUILD ML ENHANCED EVENT STREAMING APPLICATIONS WITH JAVA MICROSERVICES

In this talk we will walk through how to build event streaming applications as functions running in with cloud native messaging via Apache Pulsar that run on near infinite scale in any cloud, docker or K8. We will show you have to deploy ML functions to transform real-time data for IoT, Streaming Analytics and many other use cases. After this talk you will be able to build Java microservices with ease and deploy them anywhere utilizing the open source unified streaming and messaging platform, Apache Pulsar. Finally, we will show you have to add dashboards with Web Sockets, no code data sinks, integrate with Apache NiFi data pipelines, SQL Reports with Apache Spark and finally continuous ETL with Apache Flink. I have built many of these applications for many organizations as part of the FLiPN Stack. Let's build next generation applications today regardless if your data is REST APIs, Sensors, Logs, NoSQL Sources, Events or Database tables.

https://github.com/tspannhw?tab=repositories&q=FLiP&type=source

Building FLiPN Stack Edge AI Applications

Introducing the FLiPN stack which combines Apache Flink, Apache NiFi, Apache Pulsar and other Apache tools to build fast applications for IoT, AI, rapid ingest with Java, C#, Python or Golang.

FLiPN provides a quick set of tools to build applications at any scale for any streaming and IoT use cases.

Apache Pulsar enables Java applications to communicate asynchronously at any scale, geo-replicate and interact with non JVM applications. Pulsar also acts as a function mesh to run Java functions as a FaaS triggered by Events. All of this is open source and includes an integrated Schema Registry with support for JSON, Avro, Text and ProtoBuf schemas.

Tools
Java, Golang, Python, C#, Apache Flink, Apache Pulsar, Apache NiFi, MiNiFi, Apache MXNet, DJL.AI

References
https://streamnative.io/blog/engineering/2021-11-17-building-edge-applications-with-apache-pulsar/
https://www.datainmotion.dev/2019/08/rapid-iot-development-with-cloudera.html
https://www.datainmotion.dev/2019/09/powering-edge-ai-for-sensor-reading.html
https://www.datainmotion.dev/2019/05/dataworks-summit-dc-2019-report.html
https://www.datainmotion.dev/2019/03/using-raspberry-pi-3b-with-apache-nifi.html
https://www.datainmotion.dev/2021/11/producing-and-consuming-pulsar-messages.html

Apache Pulsar Development 101 with Python

In this session I will get you started with real-time cloud native streaming programming with Python.

We will start off with a gentle introduction to Apache Pulsar and setting up your first easy standalone cluster. We will then l show you how to produce and consume message to Pulsar using several different Python libraries including Python client, websockets, MQTT and even Kafka.

After this session you will building real-time streaming and messaging applications with Python.

Ingesting Data at Scale into Elasticsearch with Apache Pulsar

One of the best things about Elasticsearch is its ability to handle large amounts of data and serve this data with sub-millisecond latency, which makes it an ideal platform to run analytics workloads. But like any purpose-built database, there are always trade-offs to consider. Elasticsearch's case is how to load the data continuously and at scale. A way to solve this problem is by using a buffer layer that can store and forward events to Elasticsearch. Apache Pulsar provides a great alternative to implement this layer.

This talk will explain how Pulsar can implement data ingestion, validation, aggregation, and storage and push this data to Elasticsearch using the sink connector. It will provide the necessary knowledge for you to ingest any data of data, such as logs, sensor data, and streaming events into Elasticsearch for analytics and visualization.

FLiP Into Apache Pulsar Apps with MongoDB

In this session, I will introduce you to the world of Apache Pulsar and how to build real-time messaging and streaming application with a variety of OSS libraries, schemas, languages, frameworks and tools against MongoDB. We will show you all the options from MQTT, Web Sockets, Java, Golang, Python, NodeJS, Apache NiFi, Kafka on Pulsar, Pulsar protocol and more. You will FLiP your lid on how much you learn in a short time. I will send out instructions on the few steps you need to get an environment ready to start building awesome apps. We'll also show you how to quickly deploy an app to a production cloud cluster with StreamNative.

Utilizing Apache Kafka, Apache NiFi and MiNiFi for EdgeAI IoT at Scale

A hands-on deep dive on using Apache Kafka, Kafka Streams, Apache NiFi + Edge Flow Manager + MiniFi Agents with Apache MXNet, OpenVino, TensorFlow Lite, and other Deep Learning Libraries on the actual edge devices including Raspberry Pi with Movidius 2, Google Coral TPU and NVidia Jetson Nano. We run deep learning models on the edge devices and send images, capture real-time GPS and sensor data. With our low coding IoT applications providing easy edge routing, transformation, data acquisition and alerting before we decide what data to stream real-time to our data space. These edge applications classify images and sensor readings real-time at the edge and then send Deep Learning results to Kafka Streams and Apache NiFi for transformation, parsing, enrichment, querying, filtering and merging data to various Apache data stores including Apache Kudu and Apache HBase.

https://www.datainmotion.dev/2019/08/updating-machine-learning-models-at.html

Using Apache NiFi with Apache Pulsar for Fast Data On-Ramp

As the Pulsar communities grows, more and more connectors will be added. To enhance the availability of sources and sinks and to make use of the greater Apache Streaming community, joining forces between Apache NiFi and Apache Pulsar is a perfect fit. Apache NiFi also adds the benefits of ELT, ETL, data crunching, transformation, validation and batch data processing. Once data is ready to be an event, NiFi can launch it into Pulsar at light speed.

I will walk through how to get started, some use cases and demos and answer questions.

Hail Hydrate! From Stream to Lake with Pulsar and Friends

A cloud data lake that is empty is not useful to anyone.

How can you quickly, scalably and reliably fill your cloud data lake with diverse sources of data you already have and new ones you never imagined you needed. Utilizing open source tools from Apache, the FLiP stack enables any data engineer, programmer or analyst to build reusable modules with low or no code. FLiP utilizes Apache NiFi, Apache Pulsar, Apache Flink and MiNiFi agents to load CDC, Logs, REST, XML, Images, PDFs, Documents, Text, semistructured data, unstructured data, structured data and a hundred data sources you could never dream of streaming before.

I will teach you how to fish in the deep end of the lake and return a data engineering hero. Let's hope everyone is ready to go from 0 to Petabyte hero.

FLiP Stack for Cloud Data Lakes

Utilizing an all Apache stack for Rapid Data Lake Population and querying utilizing Apache Flink, Apache Pulsar and Apache NiFi.

We can quickly stream data to and from any datalake, data lake house, lakehouse, database or any datamart regardless of cloud or size. FLiP allows for Java and Python developers to build scalable solutions that span messaging and streaming in cloud native fashion with full monitoring.

Apache Pulsar with MQTT for Edge Computing

Today we will span from edge to any and all clouds to support data collection, real-time streaming, sensor ingest, edge computing, IoT use cases and edge AI. Apache Pulsar allows us to build computing at the edge and produce and consume messages at scale in any IoT, hybrid or cloud environment. Apache Pulsar supports MoP which allows for MQTT protocol to be used for high speed messaging.

We will teach you to quickly build scalable open source streaming applications regardless of if you are running in containers, pods, edge devices, VMs, on-premise servers, moving vehicles and any cloud.

Continuous SQL with Kafka and Flink

In this talk, I will walk through how someone can setup and run continous SQL queries against Kafka topics utilizing Apache Flink. We will walk through creating Kafka topics, schemas and publishing data.

We will then cover consuming Kafka data, joining Kafka topics and inserting new events into Kafka topics as they arrive. This basic over view will show hands-on techniques, tips and examples of how to do this.

Apache NiFi 101: Introduction and Best Practices

https://www.datainmotion.dev/2020/06/no-more-spaghetti-flows.html
https://github.com/tspannhw/EverythingApacheNiFi
https://www.datainmotion.dev/2020/12/basic-understanding-of-cloudera-flow.html
https://www.datainmotion.dev/2020/10/top-25-use-cases-of-cloudera-flow.html

In this talk, we will walk step by step through Apache NiFi from the first load to first application. I will include slides, articles and examples to take away as a Quick Start to utilizing Apache NiFi in your real-time dataflows. I will help you get up and running locally on your laptop, Docker or in CDP Public Cloud.

I will cover:
Terminology
Flow Files
Version Control
Repositories
Basic Record Processing
Provenance
Backpressure
Prioritizers
System Diagnostics
Processors
Process Groups
Scheduling and Cron
Bulletin Board
Relationships
Routing
Tasks
Networking
Basic Cluster Architecture
Listeners
Controller Services
Remote Ports
Handling Errors
Funnels

Real-Time Streaming in Any and All Clouds, Hybrid and Beyond

Description
Today, data is being generated from devices and containers living at the edge of networks, clouds and data centers. We need to run business logic, analytics and deep learning at the scale and as events arrive.

Tools:
Apache Flink, Apache Kafka, Apache NiFi, MiNiFi, DJL.ai Apache MXNet.

References:
https://www.datainmotion.dev/2019/11/introducing-mm-flank-apache-flink-stack.html
https://www.datainmotion.dev/2019/08/rapid-iot-development-with-cloudera.html
https://www.datainmotion.dev/2019/09/powering-edge-ai-for-sensor-reading.html
https://www.datainmotion.dev/2019/05/dataworks-summit-dc-2019-report.html
https://www.datainmotion.dev/2019/03/using-raspberry-pi-3b-with-apache-nifi.html

Source Code: https://github.com/tspannhw/MmFLaNK

Tags
AI + Machine Learning Databases Developer Tools Hybrid Integration Internet of Things

Real-Time Streaming in Azure

Today, data is being generated from devices and containers living at the edge of networks, clouds and data centers. We need to run business logic, analytics and deep learning at the scale and as events arrive.

Tools:
Apache Flink, Apache Kafka, Apache NiFi, MiNiFi, DJL.ai Apache MXNet.

References:
https://www.datainmotion.dev/2019/11/introducing-mm-flank-apache-flink-stack.html
https://www.datainmotion.dev/2019/08/rapid-iot-development-with-cloudera.html
https://www.datainmotion.dev/2019/09/powering-edge-ai-for-sensor-reading.html
https://www.datainmotion.dev/2019/05/dataworks-summit-dc-2019-report.html
https://www.datainmotion.dev/2019/03/using-raspberry-pi-3b-with-apache-nifi.html

Source Code: https://github.com/tspannhw/MmFLaNK

Pack Your Bags, We’re Going on a Data Journey!

This three-hour workshop is aimed at organizations who have (or are about to) embark(ed) on their data journey, and are looking for guidance on best practices, tools, and recommendations on navigating through the full data science lifecycle from collection to visualization.

Participants will be exposed to a variety of speakers and data experts to illuminate the critical elements that go into making their data journey a success. The session will kick off with a keynote speaker that will provide an overview of the data journey, followed by a hands-on demonstration highlighting the various personas needed in a data team participating in this journey. The demo will also showcase some of the open-source tools used by experts in the field, while using datasets and use cases relevant to nonprofits. Finally, participants will rotate between breakout sessions to further explore each of these tools and personas, and to give them an opportunity to speak with data specialists who can help address their specific data questions and challenges.

Participants will leave this interactive workshop armed with a stronger understanding and a roadmap to embark on their data journey successfully. We will also be incorporating best practices and learnings from our successful workshop at NetHope 2019.

Cracking the Nut, Solving Edge AI with Apache Tools and Frameworks

Today, data is being generated from devices and containers living at the edge of networks, clouds and data centers. We need to run business logic, analytics and deep learning at the edge before we start our real-time streaming flows. Fortunately using the all Apache Mm FLaNK stack we can do this with ease! Streaming AI Powered Analytics From the Edge to the Data Center is now a simple use case. With MiNiFi we can ingest the data, do data checks, cleansing, run machine learning and deep learning models and route our data in real-time to Apache NiFi and/or Apache Kafka for further transformations and processing. Apache Flink will provide our advanced streaming capabilities fed real-time via Apache Kafka topics. Apache MXNet models will run both at the edge and in our data centers via Apache NiFi and MiNiFi. Our final data will be stored in Apache Kudu via Apache NiFi for final SQL analytics.

Tools:
Apache Flink, Apache Kafka, Apache NiFi, MiNiFi, DJL.ai Apache MXNet, Apache Kudu, Apache Impala, Apache HDFS

References:
https://www.datainmotion.dev/2019/11/introducing-mm-flank-apache-flink-stack.html
https://www.datainmotion.dev/2019/08/rapid-iot-development-with-cloudera.html
https://www.datainmotion.dev/2019/09/powering-edge-ai-for-sensor-reading.html
https://www.datainmotion.dev/2019/05/dataworks-summit-dc-2019-report.html
https://www.datainmotion.dev/2019/03/using-raspberry-pi-3b-with-apache-nifi.html

Source Code: https://github.com/tspannhw/MmFLaNK

Using the Mm FLaNK Stack for Edge AI (Flink, NiFi, Kafka, Kudu)

Introducing the FLaNK stack which combines Apache Flink, Apache NiFi, Apache Kafka and Apache Kudu to build fast applications for IoT, AI, rapid ingest.

FLaNK provides a quick set of tools to build applications at any scale for any streaming and IoT use cases.

https://www.flankstack.dev/

Tools
Apache Flink, Apache Kafka, Apache NiFi, MiNiFi, Apache MXNet, Apache Kudu, Apache Impala, Apache HDFS

References
https://www.datainmotion.dev/2019/08/rapid-iot-development-with-cloudera.html
https://www.datainmotion.dev/2019/09/powering-edge-ai-for-sensor-reading.html
https://www.datainmotion.dev/2019/05/dataworks-summit-dc-2019-report.html
https://www.datainmotion.dev/2019/03/using-raspberry-pi-3b-with-apache-nifi.html

CloudX 2024 Sessionize Event

November 2024 Santa Clara, California, United States

Budapest Data+ML Forum 2024 Sessionize Event

June 2024 Budapest, Hungary

Open Source Analytics Conference 2023 Sessionize Event

December 2023

JCON WORLD 2023 Sessionize Event

November 2023

Pulsar Summit North America 2023 Sessionize Event

October 2023 San Francisco, California, United States

AI DevWorld 2023 Sessionize Event

October 2023 Santa Clara, California, United States

WeAreDevelopers World Congress 2023 Sessionize Event

July 2023 Berlin, Germany

Big Data Fest by SoftServe Sessionize Event

May 2023

Pulsar Virtual Summit Europe 2023 Sessionize Event

May 2023

Real-Time Analytics Summit 2023 Sessionize Event

April 2023 San Francisco, California, United States

Devnexus 2023 Sessionize Event

April 2023 Atlanta, Georgia, United States

ScyllaDB Summit 2023 Sessionize Event

February 2023

Pulsar Summit Asia 2022 Sessionize Event

November 2022

2022 All Day DevOps Sessionize Event

November 2022

AI DevWorld 2022 Sessionize Event

October 2022 San Jose, California, United States

Data on Kubernetes Day @ Kubecon / CloudNativeCon NA 2022 Sessionize Event

October 2022 Detroit, Michigan, United States

Current 2022: The Next Generation of Kafka Summit Sessionize Event

October 2022 Austin, Texas, United States

JConf.dev 2022 Sessionize Event

September 2022 Chicago, Illinois, United States

Cloud Lunch and Learn Sessionize Event

July 2022

SQLBits 2022 Sessionize Event

March 2022 London, United Kingdom

Elastic Community Conference 2022 Sessionize Event

February 2022

Scylla Summit 2022 Sessionize Event

February 2022

DeveloperWeek 2022 Sessionize Event

February 2022 Oakland, California, United States

GDG DevFest UK & Ireland Sessionize Event

January 2022 London, United Kingdom

DataMinutes #2 Sessionize Event

January 2022

Pulsar Summit Asia 2021 Sessionize Event

January 2022

Porto Tech Hub Conference 2021 Sessionize Event

November 2021

Automation + DevOps Summit Sessionize Event

November 2021 Nashville, Tennessee, United States

PASS Data Community Summit 2021 Sessionize Event

November 2021

API World 2021 Sessionize Event

October 2021

InfluxDays North America Virtual Experience 2021 Sessionize Event

October 2021

AI DevWorld 2021 Sessionize Event

October 2021

Big Mountain Data and Dev Conference Sessionize Event

October 2021

Northern VA CodeCamp Fall 2021 Sessionize Event

October 2021

DBCC International 2021 Sessionize Event

October 2021

Scenic City Summit 2021 Sessionize Event

September 2021

Apache Con Global

September 2021 New Orleans, Louisiana, United States

Music City Tech 2021 Sessionize Event

September 2021

WorldFestival 2021 Sessionize Event

August 2021

Apache Con Asia

FLaNK

August 2021 Tokyo, Japan

AI and IoT Bulgaria Summit 2021 Sessionize Event

June 2021 Sofia, Bulgaria

DeveloperWeek Europe 2021 Sessionize Event

April 2021

AI DevWorld 2020 Sessionize Event

October 2020 San Jose, California, United States

NetHope Global Summit 2020 Sessionize Event

October 2020 New York City, New York, United States

Flink Forward Global Virtual 2020 Sessionize Event

October 2020

Timothy Spann

Senior Solutions Engineer

Princeton, New Jersey, United States

Actions

Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.

Timothy Spann

Actions

Links

Badges

Area of Expertise

Topics

CloudX 2024 Sessionize Event

Budapest Data+ML Forum 2024 Sessionize Event

Open Source Analytics Conference 2023 Sessionize Event

JCON WORLD 2023 Sessionize Event

Pulsar Summit North America 2023 Sessionize Event

AI DevWorld 2023 Sessionize Event

WeAreDevelopers World Congress 2023 Sessionize Event

Big Data Fest by SoftServe Sessionize Event

Pulsar Virtual Summit Europe 2023 Sessionize Event

Real-Time Analytics Summit 2023 Sessionize Event

Devnexus 2023 Sessionize Event

ScyllaDB Summit 2023 Sessionize Event

Pulsar Summit Asia 2022 Sessionize Event

2022 All Day DevOps Sessionize Event

AI DevWorld 2022 Sessionize Event

Data on Kubernetes Day @ Kubecon / CloudNativeCon NA 2022 Sessionize Event

Current 2022: The Next Generation of Kafka Summit Sessionize Event

JConf.dev 2022 Sessionize Event

Cloud Lunch and Learn Sessionize Event

SQLBits 2022 Sessionize Event

Elastic Community Conference 2022 Sessionize Event

Scylla Summit 2022 Sessionize Event

DeveloperWeek 2022 Sessionize Event

GDG DevFest UK & Ireland Sessionize Event

DataMinutes #2 Sessionize Event

Pulsar Summit Asia 2021 Sessionize Event

Porto Tech Hub Conference 2021 Sessionize Event

Automation + DevOps Summit Sessionize Event

PASS Data Community Summit 2021 Sessionize Event

API World 2021 Sessionize Event

InfluxDays North America Virtual Experience 2021 Sessionize Event

AI DevWorld 2021 Sessionize Event

Big Mountain Data and Dev Conference Sessionize Event

Northern VA CodeCamp Fall 2021 Sessionize Event

DBCC International 2021 Sessionize Event

Scenic City Summit 2021 Sessionize Event

Music City Tech 2021 Sessionize Event

WorldFestival 2021 Sessionize Event

AI and IoT Bulgaria Summit 2021 Sessionize Event

DeveloperWeek Europe 2021 Sessionize Event

AI DevWorld 2020 Sessionize Event

NetHope Global Summit 2020 Sessionize Event

Flink Forward Global Virtual 2020 Sessionize Event

Timothy Spann

Links

Actions