Session

Building Voice AI That Doesn't Fall Apart in Production

Real-time voice AI agents look magical in a demo and then fall apart the moment you put them in front of real users on a real phone line. This talk is a field guide to what breaks and how to fix it, drawn from shipping voice AI infrastructure at Vapi and Telnyx and running production agents at scale.

We'll walk through the architecture of a production voice agent pipeline, streaming speech-to-text, LLM orchestration, streaming text-to-speech, and the failure modes that don't show up until you hit scale.

Some topics we'd cover in the session:

- Latency budgets and where they leak. First-token latency, TTS time-to-first-audio, and the cascading delays that turn a snappy agent into a slow one. How to measure each hop and where the real costs hide.
- Interruption handling and full-duplex vs half-duplex. Barge-in, echo, overlapping speech, and why most "real-time" implementations aren't actually real-time.
- Multi-provider routing and fallback. Why single-provider voice AI is fragile, how to route across STT/TTS providers based on latency, cost, and quality, and what the actual tradeoffs look like in production data.
- The cost-per-minute math nobody publishes. STT, LLM tokens, and TTS each have different pricing models and unit economics. We'll break down what a real voice agent minute actually costs at scale and where inference economics are heading.
- What changes when you go from 10 calls to 10,000. Concurrency, queueing, SIP/media handling, regional latency, and the infrastructure decisions that don't matter until they suddenly do

Abhishek Sharma

Abhishek is a senior technical product marketing manager at Telnyx, an infra company for real time communications.

South San Francisco, California, United States

Actions

Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.

Jump to top