Build a 3-Agent System that Refused to Guess: Hands-on Multi-Agent Orchestration with Evals

Most multi-agent tutorials teach you how to make agents talk to each other. This workshop teaches you how to make agents refuse to talk when they shouldn't.
We'll build a working 3-agent pipeline from scratch during the session. Each agent has a single job, a structured JSON contract defining its input and output, and a refusal threshold: a confidence score below which the agent stops processing and surfaces uncertainty to the user instead of passing a bad answer downstream.
What you'll build in 75 minutes:

Agent 1 (Parser): ingests a raw document and extracts structured fields. If any field falls below confidence threshold, it flags the gap instead of guessing.
Agent 2 (Analyzer): takes the parsed output and performs comparison logic. Refuses to run if Agent 1 flagged incomplete data.
Agent 3 (Generator): produces a user-facing output (a summary, recommendation, or draft). Carries a confidence surface that tells the end user what the system is sure about and what it isn't.

The architecture pattern is the one I used for AidLens, a financial aid decoder I shipped in May 2026 for first-generation college students where confident wrong answers cost people real money. But the pattern is framework-agnostic: you'll use it for any domain where your agent pipeline serves users who can't verify the output themselves.
Tech stack for the workshop: Python, OpenAI or Anthropic API (bring your own key or use the shared sandbox), no framework dependency (we'll build the orchestrator from scratch so you understand every decision). You'll leave with a running 3-agent repo on your machine.
What makes this different from a LangChain/CrewAI tutorial:

We design the eval FIRST, then build the agents to pass it (not build first, eval later)
Every agent has a refusal mode, not just a happy path
The orchestrator is 50 lines of Python, not a framework. You'll understand what's happening.

Prerequisites: Comfortable reading Python. Familiarity with LLM API calls (any provider). Laptop with Python 3.10+ and an API key (OpenAI or Anthropic).

Sumaiya Shrabony

I help data and operations teams make AI usable at work: not hype, not theory, the operator version that survives real users, governance, and broken dashboards.

Denver, Colorado, United States

Actions

View Speaker Profile

Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.

Session

Build a 3-Agent System that Refused to Guess: Hands-on Multi-Agent Orchestration with Evals

Sumaiya Shrabony

Links

Actions