Session
AI Agent & RAG Evaluation: Metrics, Tools
Building a Retrieval-Augmented Generation (RAG) system is only half the battle — how do you know it's actually working well? In this technical session, we'll demystify RAG evaluation from the ground up. You'll learn what makes RAG evaluation uniquely challenging, explore the key metrics that matter — including context relevance, faithfulness, correctness, completeness, and context coverage — and see how to put them into practice using Amazon Bedrock Evaluation and the open-source RAGAS library.
Through live demos, we'll walk through how LLM-as-a-judge technology works, how to bring your own datasets for tailored evaluations, and how to compare results across evaluation jobs to iteratively improve your RAG pipelines. Whether you're building with Amazon Bedrock Knowledge Bases or your own custom RAG stack, these techniques apply to any system hosted anywhere.
By the end of this session, you'll have a clear framework for measuring and improving the quality of your AI-powered search and answer generation systems — and you'll have seen it all in action through practical demonstrations.
Juan Pablo Garcia Gonzalez
Solution Architect @ AWS Startups
Boston, Massachusetts, United States
Links
Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.
Jump to top