Session
From RAG to Reliable Agents: An Open Source Playbook for Evaluation, Guardrails, and LLMOps
Teams are moving from Retrieval-Augmented Generation (RAG) to agentic workflows that plan, call tools, and take actions. The hard part is no longer making a demo work; it is making behavior reliable, safe, and observable in production.
This session presents a practical, open-source “Day 2” playbook for building trustworthy agents. We cover three pillars:
Offline evaluation: automated eval harnesses using heuristic metrics (groundedness/faithfulness, relevancy) plus agent-specific checks like tool-call correctness and step success rate, with regression gates before release.
Runtime guardrails: interceptors to prevent prompt injection impact, sensitive data leakage, unsafe outputs, and unauthorized tool actions via allowlists, policy checks, and redaction.
LLMOps and observability: tracing and structured telemetry to debug multi-turn tool execution, localize failures (retrieval vs planning vs tool), and monitor drift, latency, and cost.
Attendees leave with a reference architecture, metric checklist, and implementation patterns using open-source components (e.g., Ragas/DeepEval for evals, guardrail libraries, Langfuse/OpenTelemetry-style tracing).
Puspanjali Sarma
Engineering Leader | Principal Architect | Published Author | Thought Leader | Mentor | Speaker | 40under40 Data Scientist | ML | AI | Data Engineering | Generative AI | Agents & Agentic AI
Hyderābād, India
Links
Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.
Jump to top