Session

Getting Your LLM Eval Up and Right with Automated LLMOps

Large Language Models (LLMs) are powerful, but figuring out if they give correct answers is a big challenge. One major problem is creating a reliable ground truth an accurate reference answer for evaluation. Doing this by hand is slow and expensive, and letting LLMs generate their own reference can lead to circular errors and unclear benchmarks.

In this session, we explore how open-source evaluation frameworks such as Ragas, DeepEval, MLFlow, etc. help solve this problem. We will explain in simple terms how these tools set up clear evaluation pipelines that do not rely only on human annotations. Instead, they use automated methods to generate a reliable ground truth, making it easier to judge if an LLM is performing well.

We will also show you how to integrate these tools into your LLMOps workflow to continuously test and improve your model. You will learn about key evaluation metrics like faithfulness, contextual precision, and answer relevancy.

With this session, you will end up having a practical understanding of how to set up an evaluation framework that automatically generates ground truth and provides clear, actionable feedback.

Suvrakamal Das

Software Engineer @Mattoboard

San Francisco, California, United States

Actions

Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.

Jump to top