Francesco Tisiot

Staff Developer Advocate at Aiven.io

Verona, Italy

Actions

Francesco comes from Verona, Italy and works as a Staff Developer Advocate at Aiven. With his many years of experience as a data engineer, he has stories to tell and advice for data-wranglers everywhere. Francesco loves sharing knowledge with others as a speaker and writer, and is on a mission to defend the world from bad Italian food!

Area of Expertise

Information & Communications Technology

Topics

Apache Kafka
PostgreSQL
Elastic Stack
MySQL
Redis
Apache Cassandra
python3
data
Data Science
Big Data
Databases
Data Management
Data Visualization
Data Platform
Data Analytics
Database
Database Administration

Kickstart your Kafka with Faker Data

We all love to play with the shiny toys, but an event stream with no events is a sorry sight. In this session you’ll see how to create your own streaming dataset for Apache Kafka using Python and the Faker library. You’ll learn how to create a random data producer and define the structure and rate of its message delivery. Randomly-generated data is often hilarious in its own right, and it adds just the right amount of fun to any Kafka and its integrations!

Solving the knapsack problem with recursive queries and PostgreSQL

Optimization problems are everywhere, from deciding which clothes to pack in our luggage (aka the knapsack problem), to selecting the tasks that will be worked during a sprint. Trying to solve these type of problems by hand is a tedious task often resulting in sub-optimal decisions.

In this talk, we'll understand how PostgreSQL recursive queries can help. Starting from the proper problem definition, we'll then explore how to build queries that call themselves recursively, what are the risks associated with this approach and safeguards we can set to optimise performances. Finally we'll demonstrate how two new features released in PostgreSQL 14 enable an easier handling of the recursive statements.

If you're into PostgreSQL and eager to understand how recursion works, this session is for you!

Practical tips and tricks for Apache Kafka messages integration

Interacting with Apache Kafka seems straightforward at first, you “just” push and pull messages. Yet it can quickly become a source of frustration as the user encounters timeouts, vague error descriptions and disappearing messages. Experience helps a lot and I’m here to share what I know.

In this talk you will learn the tips & tricks I wish I had known at the beginning of my Apache Kafka journey. We’ll discuss topics like producer acknowledgments, server and consumer parameters (auto_offset_reset anyone?) that are commonly overlooked causing lots of developer’s pain. I’ll share with you how to generate code that works as expected on the first run, making your first integration painless. These tips will kickstart your Apache Kafka experience in Python and save you hours of debugging.

JDBC Source Connector: What could go wrong?

When needing to source Database events into Apache Kafka, the JDBC source connector usually represents the first choice for its flexibility and the almost-zero setup required on the database side. But sometimes simplicity comes at the cost of accuracy and missing events can have catastrophic impacts on our data pipelines.

In this session we'll understand how the JDBC source connector works and explore the various modes it can operate to load data in a bulk or incremental manner. Having covered the basics, we'll then analyse the edge cases causing things to go wrong like infrequent snapshot times, out of order events, non-incremental sequences or hard deletes.

Finally we'll look at other approaches, like the Debezium source connector, and demonstrate how some more configuration on the database side helps avoid problems and sets up a reliable source of events for our streaming pipeline.

Want to reliably take your Database events into Apache Kafka? This session is for you!

I don’t want to miss a Thing 🎶 - Track Database Changes with Apache Kafka

An application with a central database and a series of ETL (Extract, Transform, Load) flows to get data from there to the data warehouse is a familiar pattern in software architecture everywhere. It works very well but usually ETLs are single-purpose oriented, additional targets create dedicated flows which over time can turn into too much load and slow things down.
A more performant alternative is to use Kafka Connect to pick up database changes and pass them to Apache Kafka. Once the data is in Kafka, can be reshaped and pushed to several downstream applications without creating additional load to the source system. This open source data streaming platform integrates with your existing setup and with a bit of configuration can replace too-much-of-a-good-thing ETL flows and bring simplicity and performance to your data pipeline.
This session will show how Apache Kafka operates and how existing data platforms, like a PostgreSQL database, can be integrated with it both as data source and target. Several Kafka Connect options will be explored in order to understand benefits and limitations. The session is intended for everyone who wants to avoid the classic “Spaghetti architecture” and base their data pipeline on Apache Kafka, the main open source data streaming technology.

Fix your Strings in PostgreSQL

Strings are one of the most used types in databases; they can store pretty much any data and don't enforce any rules on the inserted input. Yet too much freedom sometimes leads to inconsistencies: is it Aivan or Aiven? Øyvind or Oyvind? Wine or Whine?
These seemingly small differences can have bad side-effects, causing lookups to fail and incorrect aggregation results to be returned. Luckily all is not lost: PostgreSQL has some features that can help us make sense of the chaos.

In this talk you will learn what PostgreSQL has to offer: starting with pattern matching, passing by regular expressions, and ending with more advanced functionality exposed by the fuzzystrmatch and unaccent extensions. I'll demonstrate what tools can help you fixing string inconsistencies and how to avoid making the same mistakes again in the future. This session is recommended for anyone who deeply cares about their (string) data quality.

Event-Driven applications: Apache Kafka and Python

Code and data go together like tomato and basil; not many applications work without moving data in some way. As our applications modernise and evolve to become more event-driven, the requirements for data are changing. In this session we will explore Apache Kafka, a data streaming platform, to enable reliable real-time data integration for your applications.

We will look at the types of problems that Kafka is best at solving, and show how to use it in your own applications. Whether you have a new application or are looking to upgrade an existing one, this session includes advice on adding Kafka using the Python libraries and includes code examples (with bonus discussion of pizza toppings) to use.

With Kafka in place, many things are possible so this session also introduces Kafka Connect, a selection of pre-built connectors that you can use to route events between systems and integrate with other tools. This session is recommended for engineers and architects whose applications are ready for next-level data abilities.

Breathe in, breathe out: get Kafka Connect configs right!

Kafka Connect is the spell book for creating magical data streaming setups. It allows us to integrate Apache Kafka with the rest of our data ecosystem and get all the data flowing to the right place. However you need some rather dark magic to configure all the weird and wonderful connectors recently, and this talk will teach you the tricks you need.

We'll talk about streaming data into topics, the data formats to use and what to look out for when Kafka Connect is plugging data from another platform into your setup. Since we don't live in a perfect world, we'll also cover configurations like error tolerance, dead letter queues and single message transforms that can make things more robust. You'll see some examples of good practices, and hear some stories about how I learned a few of these things the hard way.

Finally we'll shed light on some of the options, like auto evolution, that seem like a great idea when you are prototyping a new solution but which can store up problems for the longer term. If you are ready to make magic with Kafka Connect and the Apache Kafka ecosystem, this is the talk for you!

Apache Kafka and Flink: Stateful Streaming Data Pipelines made easy with SQL

A stateful streaming data pipeline needs both a solid base and an engine to drive the data. Apache Kafka is an excellent choice for storing and transmitting high throughput and low latency messages. Apache Flink adds the cherry on top with a distributed stateful compute engine available in a variety of languages, including SQL.

In this session we'll explore how Apache Flink operates in conjunction with Apache Kafka to build stateful streaming data pipelines, and the problems we can solve with this combination. We will explore Flink's SQL client, showing how to define connections and transformations with the most known and beloved language in the data industry.
This session is aimed at data professionals who want to reduce the barrier to streaming data pipelines by making them configurable as set of simple SQL commands.

⚡ talk: No pineapple on pizza! Streaming anomaly detection with Apache Kafka and Apache Flink

There's a rule in Italy that states: "pineapple doesn't belong to pizza". Yet it's a common choice around the world and a big discussion topic online.

We'll use this funny example to show the power of the best streaming open source duo: Apache Kafka and Flink. We will initially showcase how data can flow in streaming mode through Kafka topics, and then add Flink on top to detect anomalies (yep pineapple, I'm looking at you), calculate aggregations, and enrich our pipelines with data coming from external systems like a PostgreSQL database.

If you want to see the creation of a streaming data pipeline for anomaly detection in 10 minutes, this talk is for you.

Current 2023: The Next Generation of Kafka Summit Sessionize Event

September 2023 San Jose, California, United States

Current 2022: The Next Generation of Kafka Summit Sessionize Event

October 2022 Austin, Texas, United States

NDC London 2022 Sessionize Event

May 2022 London, United Kingdom

Francesco Tisiot

Staff Developer Advocate at Aiven.io

Verona, Italy

Actions

Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.

Speaker

Francesco Tisiot

Actions

Links

Area of Expertise

Topics

Sessions

Kickstart your Kafka with Faker Data

Solving the knapsack problem with recursive queries and PostgreSQL

Practical tips and tricks for Apache Kafka messages integration

JDBC Source Connector: What could go wrong?

I don’t want to miss a Thing 🎶 - Track Database Changes with Apache Kafka

Fix your Strings in PostgreSQL

Event-Driven applications: Apache Kafka and Python

Breathe in, breathe out: get Kafka Connect configs right!

Apache Kafka and Flink: Stateful Streaming Data Pipelines made easy with SQL

⚡ talk: No pineapple on pizza! Streaming anomaly detection with Apache Kafka and Apache Flink

Events

Current 2023: The Next Generation of Kafka Summit Sessionize Event

Current 2022: The Next Generation of Kafka Summit Sessionize Event

NDC London 2022 Sessionize Event

Francesco Tisiot

Links

Actions