Session
Beyond the Demo: Testing Strategies for Production AI Agents
AI agents are easy to demo, but much harder to trust in production than traditional software. They can return different results for the same prompt and context, and we cannot prove they will behave correctly every time. A small prompt change, model switch, tool response, timeout or streaming issue can change the behaviour of the whole application. So how do we test an AI agent that is part of a real system?
This session does not replace the unit and integration tests you would already write for a JVM application. Instead, it focuses on the additional testing strategies you need once an AI agent becomes part of the system.
We will demonstrate a practical testing strategy for production AI agents and run the tests live. First, we will test the agent graph itself: verifying flow, decisions and tool calls without paying for model inference on every test run. This gives fast, deterministic feedback about the orchestration of the agent, while making clear what this layer does not prove: the quality of the model-dependent output.
Next, we will use eval-driven development. We will update a prompt live, run evaluation tests, compare the results, and discuss how this helps when improving prompts, changing context, switching models or preventing regressions.
Finally, we will test the application around the agent. We will use mocked AI inference directly in the cloud to simulate slow responses, streaming and provider failures, so we can run repeatable cloud integration tests without depending on real AI services for every scenario.
Attendees will leave with a layered testing model for production AI agents: graph tests for deterministic orchestration, evals for model-dependent behaviour and prompt quality, and mocked cloud integration tests for the full application experience.
Elena van Engelen
Independent Senior Software Engineer
Vught, The Netherlands
Links
Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.
Jump to top