Session
Multimodality with Gemini: Unleashing the Power of Text, Videos, Images and more
Gemini is the most capable and general model Google has ever built. It was built from the ground up to be multimodal, which means it can generalize and seamlessly understand, operate across, and combine different types of information, including text, code, images, and video. This talk dives into the exciting world of Gemini, a cutting-edge foundation model developed by Google. Discover how Gemini seamlessly integrates text and image processing, enabling you to:
- Analyze and understand the content of images, videos, and audio files
- Perform cross-modal tasks like image captioning and visual question-answering
- Explore the potential of multimodality for various applications, from creative content generation to advanced information retrieval.
Additionally, we'll delve into the core techniques that make LLMs multimodal, including contrastive learning and LIMoE—Learning Multiple Modalities with One Sparse Mixture-of-Experts Model. Learn more here: https://research.google/blog/limoe-learning-multiple-modalities-with-one-sparse-mixture-of-experts-model/
Join us to unlock the power of Gemini and push the boundaries of AI!
Henry Ruiz
Research Scientist at Texas A&M AgriLife Research, GDE in ML
College Station, Texas, United States
Links
Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.
Jump to top