Indexing your office documents with Elastic stack and FSCrawler

You have plenty of Open Office, Microsoft Office, PDF, images... documents and you may want to be able to search for their metadata and content. How can you do that?

In this talk, David will explain how Apache Tika can be used for that and how to combine this fantastic library with Elastic Stack:

* Elasticsearch [ingest-attachment processor](https://www.elastic.co/guide/en/elasticsearch/reference/current/attachment.html)
* [FSCrawler](https://github.com/dadoonet/fscrawler)
* [Workplace Search](https://www.elastic.co/workplace-search) connector for FSCrawler to have a ready to use and powerful user interface for your documents.

We will run hybrid search on our document set with AI powered models, so we will be able to run semantic search in addition to the traditional "term based search" and we will ask OpenAI to provide an comprehensive answer on the questions we are asking.

I'm the author of the FSCrawler project which has now a significant number of stars on Github (~1k).
After 10 years of development, I realized that I'm not speaking about this topic although lot of people are trying to find a free and open solution to search across their enterprise documents located on a file system.

This session explains how it works under the hood and what is the roadmap.

20% slides, 80% demos

David Pilato

Developer | Evangelist @ elastic

Cergy, France

Actions

View Speaker Profile

Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.

Session

Indexing your office documents with Elastic stack and FSCrawler

David Pilato

Links

Actions