AI Quality Engineering: Observability, Governance & Reliability for LLM Agent Architectures

As engineering teams transition from single-model assistants to orchestrating networks of tool-using agents (like the one you deployed at Apple ) and complex multilingual generation pipelines, they confront a new class of systemic failures. These LLM-powered architectures behave like distributed systems, not deterministic functions, leading to issues like: inter-agent disagreement, reasoning drift, semantic divergence across locales, and subtle cascade failures that traditional QA cannot detect.

This session introduces AI Quality Engineering (AI-QE), a vital discipline for making LLM-powered systems observable, reliable, and governable at scale. Drawing from real-world deployments in multi-agent orchestration and global localization workflows, we will walk through the telemetry, evaluation methods, and governance patterns required to ensure alignment and robustness over time

Key Takeaways:
1. Distributed Reasoning Observability: Building telemetry and instrumentation to trace reasoning hand-offs across multi-agent workflows, model delegation-loop failures, and measure tool-use effectiveness
2.Inter-Agent Evaluation Metrics: Quantifying collaboration and systemic reliability using metrics like context-retention fidelity, agreement-divergence ratios, and hallucination cascade detection
3. Multilingual Governance: Techniques to prevent semantic drift and cultural inconsistencies across languages and locales, a core challenge in global RAG pipelines.
4. Quality Gates & CI/CD: Implementing quality gates, domain-specific rubric scoring, and LLM-as-judge ensembles that integrate directly into continuous integration and deployment processes.

Rajeshwari Sah

Rajeshwari Sah, Machine Learning Engineer at Apple

Sunnyvale, California, United States

Actions

View Speaker Profile

Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.

Session

AI Quality Engineering: Observability, Governance & Reliability for LLM Agent Architectures

Rajeshwari Sah

Links

Actions