Building Realtime Voice AI Agents in the Physical World

Voice AI is moving beyond web apps and chat interfaces into toys, wearables, kiosks, robots, and other physical devices. But building a realtime AI agent on hardware is very different from building one in the browser.

In this session, I’ll show how to build a working realtime voice AI agent that runs through an ESP32-based device with a microphone, speaker, and cloud speech-to-speech pipeline. We’ll cover the full stack: audio capture, streaming over WebSockets, latency tradeoffs, speech-to-text, LLM reasoning, text-to-speech, device state, and the practical constraints of running AI interactions on low-power hardware.

The session includes a live demo of a physical device having a realtime spoken conversation with an AI character, plus a breakdown of the architecture behind it.

Attendees will leave with a clear mental model for how to bring AI agents into the physical world, what breaks in production, and how to think about latency, reliability, and user experience when AI leaves the screen.

Akash Deb

Founder, CEO at ElatoAI

San Francisco, California, United States

Actions

View Speaker Profile

Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.