Your Agent Lies and Passes Every Test

Your agent gives a confident answer. It sounds right, your tests say PASS, and it is completely made up. A test that only checks "did it finish the task" never catches a lie. It gets worse across a conversation: an agent can start safe and drift, one reasonable sounding turn at a time, into advice it should never give. This talk shows how to catch both: spot a fabricated answer with zero-shot detection that needs no labeled training data, score every turn so you see drift as it happens instead of only grading the final answer, and add a real-time guardrail that swaps an unsafe answer for a safe one before the user ever sees it.

Elizabeth Fuentes Leone

Developer Advocate

San Francisco, California, United States

Actions

View Speaker Profile

Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.

Session

Your Agent Lies and Passes Every Test

Elizabeth Fuentes Leone

Links

Actions