$47 a Minute, and Your Agent Calls the Wrong API

Your agent works in dev. In production it costs $47 a minute and calls the wrong API. Production agents fail two ways: financially through runaway cost, and functionally through wrong tool usage. A 5 percent accuracy gain at 20 times the cost is not sustainable. Pareto frontier analysis plots quality against cost across models and finds where no alternative is both cheaper and better. Prompt caching cuts multi-turn cost by 90 percent: an agent re-reading a 20K token system prompt every turn wastes most of its tokens, and caching takes a 10-turn conversation from $4.80 to $0.52. Tool correctness is cheaper than you think. Cascading checks catch 83 percent of errors for free with deterministic rules, more in 1 to 10ms with constraint validation, and the rest with one LLM call. Full coverage costs 20 times less than validating everything with an LLM ($0.12 versus $2.40 for 1,200 cases). You'll walk away with: • Per-invocation cost tracking with budget alarms • Pareto analysis to pick the best-value model • Cascading tool validation that catches errors at 20x lower cost

Elizabeth Fuentes Leone

Developer Advocate

San Francisco, California, United States

Actions

View Speaker Profile

Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.

Session

$47 a Minute, and Your Agent Calls the Wrong API

Elizabeth Fuentes Leone

Links

Actions