Session
Building Voice AI Agent That Listens, Understands, and (Most Importantly) Sells
Voice commerce is evolving beyond simple chatbots. Google's Multimodal Live API brings 600ms-latency real-time conversations with native audio processing, multimodal understanding, and natural interruption support through bidirectional WebSocket streaming.
We built an AI shopping assistant that doesn't just talkāit understands products, answers technical questions, and guides customers through purchases using voice. By combining Live API with RAG, our assistant accesses product catalogs and knowledge bases in real-time through function calling, creating seamless consultation experiences.
This talk walks through our development journey from prototype to production. You'll see how we integrated Live API's multimodal capabilities with RAG infrastructure, designed function calling patterns for dynamic product queries, managed conversation context across sessions, and optimized for real-time performance.
We'll share practical lessons: WebSocket architecture decisions, RAG pipeline design for product knowledge, handling edge cases in voice interactions, and scaling considerations. You'll leave with actionable insights for building voice-enabled applications understanding when RAG enhances conversational AI, and avoiding common pitfalls we encountered.
Perfect for developers exploring voice commerce, conversational AI, or RAG implementations in production systems.
Sasha Denisov
EPAM, Chief Software Engineer, AI, Flutter, Dart and Firebase GDE
Berlin, Germany
Links
Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.
Jump to top