MultiModel RAGs: Unlocking the Power of Multimodality with PaliGemma

Unlocking the Potential of Multimodality with Pali-Gemma. In this session, I’ll dive into the concepts and applications of multimodality, showcasing how combining multiple forms of data can enhance AI model capabilities. I’ll walk through the process of training on Colab/Kaggle, highlighting each step from data preparation to fine-tuning. Following the training phase, I’ll demonstrate how we can leverage Hugging Face Spaces for inference and deployment, providing a practical approach to deploying powerful Vision-Language Models (VLMs) in real-world applications. This session will serve as a comprehensive guide for anyone interested in building and deploying multimodal AI solutions, showing how PaLI-Gemma, a state-of-the-art model, is trained and implemented. By the end, participants will have a solid understanding of how multimodal RAGs work and how to bring them to production efficiently.

Shubham Agnihotri

Chief Manager - Generative AI - IDFC Bank

Mumbai, India

Actions

View Speaker Profile

Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.

Session

MultiModel RAGs: Unlocking the Power of Multimodality with PaliGemma

Shubham Agnihotri

Links

Actions