Speaker

David Pilato

Developer | Evangelist @ elastic

Cergy, France

Actions

David Pilato discovered Elasticsearch project in 2011. After contributed to the project and created open source plugins for it, David joined elastic the company in 2013 where he is Developer and Evangelist. He also created and still actively managing the French spoken language User Group. At elastic, he mainly worked on Elasticsearch source code, specifically on open-source plugins. In his free time, he likes talking about elasticsearch in conferences or in companies (Brown Bag Lunches AKA BBLs - https://www.elastic.co/blog/free-lunch-for-open-source-engineers). He is also author of FSCrawler (https://github.com/dadoonet/fscrawler) project which helps to index your pdf, open office, whatever documents in elasticsearch using Apache Tika behind the scene.

Depuis 2013, David Pilato est développeur et évangéliste chez elastic.co, après avoir passé les deux années précédentes à promouvoir le projet open-source Elasticsearch. Il en anime la communauté française et organise des BBLs (https://www.elastic.co/blog/free-lunch-for-open-source-engineers) au sein des entreprises. Egalement auteur du projet FSCrawler (https://fscrawler.readthedocs.io/) qui permet d'indexer des documents pdf, open office, etc. dans elasticsearch en utilisant Apache Tika.

Area of Expertise

Information & Communications Technology

Sessions

Search: a new era en fr

Search is not just traditional TF/IDF any more but the current trend of machine learning and models has opened another dimension for search.

This talk gives an overview of:

* "Classic" search and its limitations
* What is a model and how can you use it
* How to use vector search or hybrid search in Elasticsearch
* Where OpenAI's ChatGPT or similar LLMs come into play to with Elastic

The main demo covers how to generate embeddings from a music and then use the techniques we learned to propose the most probable version of it when we hum a song 🎶🎸🎻.

The laptop sound needs to be sent to the audience.

The magic potion to boost your career en fr

The recipe for the magic potion is transmitted only to druids, normally. But exceptionally, the Council of the Druids of the Carnutes Forest has authorized me to reveal to you some of the ingredients that make up this beverage.

I may even tell you the secret ingredient!

Indexing your office documents with Elastic stack and FSCrawler en fr

You have plenty of Open Office, Microsoft Office, PDF, images... documents and you may want to be able to search for their metadata and content. How can you do that?

In this talk, David will explain how Apache Tika can be used for that and how to combine this fantastic library with Elastic Stack:

* Elasticsearch [ingest-attachment processor](https://www.elastic.co/guide/en/elasticsearch/reference/current/attachment.html)
* [FSCrawler](https://github.com/dadoonet/fscrawler)
* [Workplace Search](https://www.elastic.co/workplace-search) connector for FSCrawler to have a ready to use and powerful user interface for your documents.

We will run hybrid search on our document set with AI powered models, so we will be able to run semantic search in addition to the traditional "term based search" and we will ask OpenAI to provide an comprehensive answer on the questions we are asking.

I'm the author of the FSCrawler project which has now a significant number of stars on Github (~1k).
After 10 years of development, I realized that I'm not speaking about this topic although lot of people are trying to find a free and open solution to search across their enterprise documents located on a file system.

This session explains how it works under the hood and what is the roadmap.

20% slides, 80% demos

Testcontainers for Real Integration Tests with Elasticsearch en fr

How are you testing with your database?

Mocking is not an option since you want to test the actual system.
In-memory databases, like H2 or HSQLDB, have subtle differences and not all datastores have in-memory cousins.
Managing and running tests in parallel against the actual datastore is a pain.
So what is the solution? There are some very neat solutions based on containers, namely the Docker-Maven-Plugin and Testcontainers. From your tests you can start a lightweight, throwaway instance of your datastore and this talk will walk you through how to do that.

And we will introduce the module we built for Elasticsearch: https://www.testcontainers.org/usage/elasticsearch_container.html.

Testcontainers pour de vrais tests d'intégration d'Elasticsearch en fr

Les tests d'intégration peuvent devenir un cauchemar lorsqu'ils sont lancés depuis la même JVM que votre code:

* Conflit de JARs (JAR Hell)
* Security Manager
* Effets de bord

De plus, tester avec un produit qui est lancé de façon différente de la façon dont il est lancé en production, ne garantira jamais que les tests d'intégration sont sincères.

Aussi, après avoir découvert le projet Testcontainers (https://www.testcontainers.org/) qui lance des conteneurs Docker, j'ai décidé d'écrire une implémentation pour Elasticsearch: testcontainers-java-module-elasticsearch (https://www.testcontainers.org/modules/elasticsearch/).
Je vous propose de découvrir tout cela pendant cette session.

Indexer ses documents bureautique avec la suite Elastic et FSCrawler en fr

Vous avez sous la main des tonnes de documents Open Office, Microsoft Office, PDF voire des images... et vous aimeriez être capable de chercher dans leurs meta-données et dans le contenu lui-même. Comment faire ? Surtout depuis l'annonce de la fin de Google Search Appliance.

Dans cette session, David expliquera comment Apache Tika peut fournir ce service et comment combiner cette fantastique librairie avec elasticsearch :

* Elasticsearch [ingest-attachment processor](https://www.elastic.co/guide/en/elasticsearch/reference/current/attachment.html)
* [FSCrawler](https://github.com/dadoonet/fscrawler)
* Connecteur [Workplace Search](https://www.elastic.co/workplace-search) pour FSCrawler afin de disposer sur étagère d'une interface utilisateur puissante pour vos documents.

Nous lancerons également des recherches hybrides sur notre base documentaire à l'aide de modèles d'IA et nous serons ainsi capable de faire de la recherche sémantique en complément de la "recherche traditionnelle par termes". Et enfin, nous demanderons à OpenAI de nous fournir une réponse intelligible aux questions que nous venons de poser.

Auteur du projet depuis plus de 10 ans, je n'en ai jamais vraiment fait la promotion alors qu'il devient assez populaire (env. 1000 stars sur Github).
Il est temps de remédier à cela, surtout qu'il apporte des solutions utiles pour des cas d'usage assez courants en entreprise, à savoir, comment indexer du contenu tel que des documents PDF, Open Office, ...

Format 20% slides et 80% démo.

J'expliquerai les différentes stratégies de code par lesquelles je suis notamment passé, comme passer d'un monolithe maven à un projet multi-modules, de l'introduction de Docker pour les tests d'intégration, de la mécanique de "watching de répertoires" que j'ai implémentée avec les failles d'une telle implémentation ainsi que l'avenir du projet.

Un moteur de recherche NoSQL pour chercher^H^H^H^H^H^H^H^H trouver... fr

Vous cherchez toujours dans vos données avec des `SELECT * FROM person WHERE name like '%david%pilato%"` ?

Au delà des performances obtenues, êtes-vous certain de renvoyer les résultats les plus pertinents pour vos utilisateurs d'abord ?

Venez découvrir comment un moteur de recherche vous aidera à répondre aux questions posées par vos utilisateurs, de manière pertinente et efficace, tout en apportant des fonctionnalités d'analyse des résultats et ce, quelque soit le volume...

Advanced (elastic)search for your legacy application en fr

How do you mix SQL and NoSQL worlds without starting a messy revolution?

This live coding talk will show you how to add Elasticsearch to your legacy application without changing all your current development habits. Your application will have suddenly have advanced search features, all without the need to write complex SQL code!

David will start from a Spring Boot/MySQL based application and will add a complete integration of Elasticsearch, all live from the stage during his presentation.

Enriching postal addresses with Elastic stack en fr

> Come and learn how you can enrich your existing data with normalized postal addresses with geo location points thanks to open data and [BANO project](http://bano.openstreetmap.fr/data/).

Most of the time postal addresses from our customers or users are not very well formatted or defined in our information systems. And it can become a nightmare if you are a call center employee for example and want to find a customer by its address.
Imagine as well how a sales service could easily put on a map where are located the customers and where they can open a new shop...

Let's take a simple example:

```json
{
"name": "Joe Smith",
"address": {
"number": "23",
"street_name": "r verdiere",
"city": "rochelle",
"country": "France"
}
}
```

Or the opposite. I do have the coordinates but I can't tell what is the postal address corresponding to it:

```json
{
"name": "Joe Smith",
"location": {
"lat": 46.15735,
"lon": -1.1551
}
}
```

In this live coding session, I will show you how to solve all those questions using the Elastic stack with a lot of focus on Logstash and Elasticsearch.

Enrichir ses adresses postales avec la suite Elastic en fr

> Découvrez comment enrichir vos données avec des adresses postales normalisées et géolocalisées grace à l'open data et le projet BANO.

Souvent, les adresses postales de nos clients ou utilisateurs sont très mal formatées dans nos systèmes d'information. De fait, si on est un service client, un call center et que l'on souhaite retrouver un client par son adresse, cela devient assez compliqué.
De même, comment répondre au service commercial qui souhaiterait présenter sur une carte où sont physiquement localisés les clients, où peut-on ouvrir une nouvelle boutique, ...

Prenons un cas simple :

```json
{
"name": "Joe Smith",
"address": {
"number": "23",
"street_name": "r verdiere",
"city": "rochelle",
"country": "France"
}
}
```

Ou l'inverse. J'ai des coordonnées, mais je ne peux pas dire à quelle adresse cela correspond :

```json
{
"name": "Joe Smith",
"location": {
"lat": 46.15735,
"lon": -1.1551
}
}
```

Cette session, sans slides, vous fera découvrir comment résoudre ces problèmes en utilisant la suite Elastic et en particulier, Logstash et Elasticsearch.

Recherche avancée pour votre application "legacy" en fr

Comment mixer SQL et NoSQL sans faire la révolution ?

Cette "live coding" conférence vous montrera comment ajouter Elasticsearch à votre application existante sans changer vos habitudes. Vous aurez des fonctions de recherche avancées sans avoir à écrire du SQL complexe !

David partira d'une application Spring Boot/MySQL et ajoutera Elasticsearch en live depuis la scène !

Visualize Your Threats with Elastic SIEM en fr

Knowing what is going on in your environment is an important part of staying on top of security issues. But how do you capture relevant metrics and visualize them? One widely-used tool for that job is the Elastic Stack, formerly known as the ELK stack. This talk shows how to ingest relevant metrics from your network and hosts as well as how to easily visualize them to find suspicious patterns and behaviors. We will be also using the latest tool named SIEM.

We will use real-world honeypot data for this example:

* The first step is to parse and enrich the data, so we can identify actual attacks, their origin, and more.
* Then we store and explore the data to find meaningful insights.
* Which leads us to visualize specific attributes — like the location of an attacker or patterns in the attacks.
* Building upon this we can combine visualizations into dashboards, giving a broader overview.
* Finally we will use the Kibana SIEM app to see how everything is now getting easy to track for attacks.

Everything done live.

Identifier les menaces avec Elastic SIEM en fr

Savoir ce qui se passe dans votre environnement est une part importante pour être informé de problèmes de sécurité. Mais comment capturer et visualiser les informations pertinentes ? Un outil open source est mondialement utilisé pour cela : la suite Elastic. Ce talk vous fera découvrir par la pratique comment ingérer les données utiles provenant de votre couche réseau, de vos machines, de vos logs ainsi que le moyen de facilement les visualiser afin d'identifier des patterns et comportements suspicieux. Nous utiliserons notamment pour cela le tout dernier outil SIEM de la suite Elastic.

Nous utiliserons pour cela des données type "piège à miel" :

* La première étape est de lire, extraire et enrichir la donnée afin d'identifier les attaques, leur source et plus encore.
* Puis stocker et explorer la donnée collectée pour trouver des indicateurs pertinents.
* Ce qui nous amènera à créer des visualisations spécifiques à notre besoin - par exemple la localisation de l'attaquant ou des patterns type d'attaque.
* Puis nous combinerons ces visualisations dans un tableau de bord consolidant l'information.
* Au final, nous utiliserons l'application SIEM pour voir comment toute cette recherche et analyse est dorénavant grandement simplifiée.

Tout cela en live.

La recherche à l'ère de l'IA en fr

La recherche ne se contente plus de l'approche maintenant traditionnelle basée sur la fréquence des termes (TF/IDF ou BM25) mais plus sur la tendance actuelle du machine learning où les nouveaux modèles ont ouvert une nouvelle dimension pour la recherche.

Cette conférence donne un aperçu de :

* La recherche "Classique" et ses limitations
* Qu'est qu'un modèle de machine learning et comment vous pouvez l'utiliser
* Comment utiliser la recherche vectorielle ou la recherche hybride dans Elasticsearch
* Comment ChatGPT d'OpenAI ou les "large language models" (LLMs) similaires viennent jouer naturellement avec Elastic

La démo principale montre comment générer des embeddings à partir de musiques puis comment trouver la musique qui s'approche le plus d'une musique que nous fredonnons 🎶🎸🎻.

Le son du PC doit être diffusé sur les enceintes.

La potion magique pour faire progresser ta carrière en fr

La recette de la potion magique ne se transmet qu'aux seuls druides, normalement. Mais exceptionnellement, le conseil des druides de la forêt des Carnutes m'a autorisé à vous révéler quelques uns des ingrédients qui constituent ce breuvage.

Il est même possible que je vous indique l'ingrédient secret !

Je raconte dans cette session sans slides, comment l'open-source a complètement accéléré ma carrière. Les idées :

* contribuez d'une manière ou d'une autre,
* n'ayez pas peur,
* opensourcez-vous !

Elasticsearch Query Language: ES|QL en fr

Elasticsearch and Kibana added a brand new query language: ES|QL — coming with a new endpoint (`_query`) and a simplified syntax. It lets you refine your results one step at a time and adds new features like data enrichment and processing right in your query. And you can use it across the Elastic Stack — from the Elasticsearch API to Discover and Alerting in Kibana. But the biggest change is behind the scenes: Using a new compute engine that was built with performance in mind.
Join us for an overview and a look at syntax and internals.

Elasticsearch Query Language: ES|QL en fr

Elasticsearch et Kibana apportent un tout nouveau langage, ES|QL, avec une nouvelle API (`_query`) et une syntaxe simplifiée. Cela vous permet d'affiner vos résultats, étape par étape et ajouter de nouvelles fonctionnalités comme par exemple l'enrichissement de données et la transformation à la volée, directement dans votre requête. Et vous pouvez l'utiliser sur toute la plateforme Elastic — depuis les API Elasticsearch jusqu'aux fonctions de "Discover" et d'"Alerting" de Kibana. Mais le changement principal n'est pas celui que vous verrez : les ingénieurs ont développé un tout nouveau moteur de calcul, construit avec la performance comme guide.
Venez découvrir un aperçu de ce nouveau moteur avec découverte de la syntaxe et du fonctionnement interne.

do MORE with stateLESS Elasticsearch en fr

How would you build Elasticsearch if it was started in 2024? Decouple compute and storage, outsource the persistence to a blob store like S3, dynamically scale up and down, have the right defaults, and a clear path for developers. This is what we have done!

In this talk, learn how we have redesigned Elasticsearch to do more with a stateless architecture that can run hot queries on cold storage. And see how you can get started with it today.

do MORE with stateLESS Elasticsearch en fr

Comment feriez-vous pour créer Elasticsearch si vous commenciez ce projet en 2025 ?

* Découpler le calcul (compute) du stockage (storage)
* Externaliser la gestion de la persistence et la réplication à un blob store comme S3, Google Cloud Storage ou encore Azure Blob Storage
* Dynamiquement ajouter ou supprimer des instances
* Avoir les bonnes valeurs par défaut
* Et un chemin hyper clair et fluide pour les développeurs

C'est exactement ce que nous avons fait avec Elastic Serverless.

Lors de cette session, vous allez découvrir comment nous avons re-conçu Elasticsearch pour lui permettre d'en faire davantage avec une architecture Stateless qui peut exécuter des requêtes sur un espace de stockage froid (cold storage).

Randomized testing: Gotta Catch ‘Em All en fr

> Chance does things well.

If we apply this idea to unit tests or integration tests, we can make our tests much more unpredictable — and as a result, uncover issues that our minds would never have dared to imagine! For example, I recently discovered a [bug](https://github.com/gestalt-config/gestalt/issues/242) in a configuration management library that occurs when the `Locale` is set to `AZ`. 🤦🏼‍♂️

Another, even simpler, example:

```java
int input = generateInteger(Integer.MIN_VALUE, Integer.MAX_VALUE);
int output = Math.abs(input);
```

This can generate `-2147483648`... which is quite unexpected for an absolute value! 😉
Randomized tests can uncover these twisted edge cases... That’s what the Elasticsearch team has been doing for years using the [RandomizedTesting](https://labs.carrotsearch.com/randomizedtesting.html) framework to test all their Java code.

Add to that real integration tests using [TestContainers](https://java.testcontainers.org/modules/elasticsearch/), and you’ll have a complete approach to tests that *regularly fail*!

After this talk, you’ll never look at the `random()` function the same way again — and you’ll discover how (bad) luck can actually help you! 🍀

Le hasard fait bien les tests en fr

> Le hasard fait bien les choses.

Si on applique cette idée aux tests unitaires ou aux tests d'intégration, on peut rendre nos tests beaucoup plus imprévisibles et du coup trouver des problèmes que notre esprit n'aurait jamais osé imaginer ! Par exemple, récemment, j'ai découvert dans une bibliothèque de gestion de configuration, [un bug](https://github.com/gestalt-config/gestalt/issues/242) qui se produit lorsque la `Locale` est configuré en `AZ`. 🤦🏼‍♂️

Un autre exemple encore plus simple :

```java
int input = generateInteger(Integer.MIN_VALUE, Integer.MAX_VALUE);
int output = Math.abs(input);
```

Peut générer `-2147483648`... Ce qui est assez inattendu pour une valeur absolue ! 😉
Les tests aléatoires peuvent découvrir ces cas tordus... C'est ce que l'équipe elasticsearch a mis en place depuis plusieurs années à l'aide du framework [RandomizedTesting](https://labs.carrotsearch.com/randomizedtesting.html) pour tester tout le code Java.

Ajoutez à ça de vrais tests d'intégration à l'aide de [TestContainers](https://java.testcontainers.org/modules/elasticsearch/) et vous aurez une approche complète pour des tests qui échouent régulièrement !

Après cette conférence, vous ne verrez plus jamais la fonction `random()` comme avant et découvrirez comment la (mal)chance peut vous aider ! 🍀

David Pilato

Developer | Evangelist @ elastic

Cergy, France

Actions

Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.

Speaker

David Pilato

Actions

Links

Area of Expertise

Sessions

Search: a new era en fr

The magic potion to boost your career en fr

Indexing your office documents with Elastic stack and FSCrawler en fr

Testcontainers for Real Integration Tests with Elasticsearch en fr

Testcontainers pour de vrais tests d'intégration d'Elasticsearch en fr

Indexer ses documents bureautique avec la suite Elastic et FSCrawler en fr

Un moteur de recherche NoSQL pour chercher^H^H^H^H^H^H^H^H trouver... fr

Advanced (elastic)search for your legacy application en fr

Enriching postal addresses with Elastic stack en fr

Enrichir ses adresses postales avec la suite Elastic en fr

Recherche avancée pour votre application "legacy" en fr

Visualize Your Threats with Elastic SIEM en fr

Identifier les menaces avec Elastic SIEM en fr

La recherche à l'ère de l'IA en fr

La potion magique pour faire progresser ta carrière en fr

Elasticsearch Query Language: ES|QL en fr

Elasticsearch Query Language: ES|QL en fr

do MORE with stateLESS Elasticsearch en fr

do MORE with stateLESS Elasticsearch en fr

Randomized testing: Gotta Catch ‘Em All en fr

Le hasard fait bien les tests en fr

David Pilato

Links

Actions