Multimodal Search with Open-Source Tools

A recent and exciting development in the world of Generative AI has been the use of language to understand images, video, and sound. One example is multi-modal retrieval, which is the process of using one modality, like text, to search another modality, like images. It is not only useful for search engines across media types, but also for grounding LLMs in factual data and reducing hallucinations. In this talk, I explain how to build a simple but performant multi-modal retrieval pipeline using completely open-source tools and models: the vector database Milvus and HuggingFace libraries for modeling and data. I discuss techniques to use multimodal retrieval most effectively and increase recall, as well as some interesting and diverse industry applications.

Stefan Webb

Developer Advocate, Zilliz

San Francisco, California, United States

Actions

View Speaker Profile

Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.

Session

Multimodal Search with Open-Source Tools

Stefan Webb

Links

Actions