Session
Getting Your LLM Eval Up and Right with Automated LLMOps
Large Language Models (LLMs) are powerful, but figuring out if they give correct answers is a big challenge. One major problem is creating a reliable ground truth an accurate reference answer for evaluation. Doing this by hand is slow and expensive, and letting LLMs generate their own reference can lead to circular errors and unclear benchmarks.
In this session, we explore how open-source evaluation frameworks such as Ragas, DeepEval, MLFlow, etc. help solve this problem. We will explain in simple terms how these tools set up clear evaluation pipelines that do not rely only on human annotations. Instead, they use automated methods to generate a reliable ground truth, making it easier to judge if an LLM is performing well.
We will also show you how to integrate these tools into your LLMOps workflow to continuously test and improve your model. You will learn about key evaluation metrics like faithfulness, contextual precision, and answer relevancy.
With this session, you will end up having a practical understanding of how to set up an evaluation framework that automatically generates ground truth and provides clear, actionable feedback.
Suvrakamal Das
Software Engineer @Mattoboard
San Francisco, California, United States
Links
Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.
Jump to top