Multi-modal LLMs, Introduction and Avoiding Common Pitfalls

Multi-modal large language models (LLMs) can understand text, images, or videos and with their ever increasing context size, they open up interesting use cases for application developers. At the same time, LLMs often suffer from hallucinations (fake content), outdated information (not based on the latest data), reliance on public data only (no private data), and a lack of citations back to original sources. In this talk, we’ll first take a tour of Gemini, Google’s multi-modal LLM, show what’s possible, and how to integrate it with your applications. We’ll then explore various techniques to overcome common LLM pitfalls, including Retrieval-Augmented Generation (RAG) to enhance prompts with relevant data, ReACT prompting to guide LLMs in verbalizing their reasoning, Function Calling to grant LLMs access to external APIs, and Grounding to link LLM outputs to verifiable information sources, and more.

Mete Atamel

Software Engineer and Developer Advocate at Google

London, United Kingdom

Actions

View Speaker Profile

Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.

Session

Multi-modal LLMs, Introduction and Avoiding Common Pitfalls

Mete Atamel

Links

Actions