Session
Build Your Own VoiceAI on the Edge: A Hands-On Workshop with ASR, Local LLMs, and Speech Synthesis
Voice interfaces are becoming a core part of modern applications, but most implementations rely heavily on cloud services for speech recognition, inference, and speech synthesis. While convenient, this architecture introduces latency, operational cost, and privacy concerns that can make it unsuitable for many environments.
In this hands-on workshop, participants will build a fully local VoiceAI pipeline using open-source tools that run entirely on their own machine. By the end of the session, attendees will have a working system that captures spoken audio, converts it to text, runs inference using a local language model, and responds with synthesized speech, all without using external APIs.
The workshop will guide participants through setting up llama.cpp for local inference, configuring speech-to-text and text-to-speech pipelines, and exposing the model through a local HTTPS endpoint that can be tested using tools like Postman.
Rather than focusing on theory, this workshop emphasizes practical implementation. Attendees will write and run the code themselves, explore the architecture of a modular voice pipeline, and experiment with model tuning and performance tradeoffs when running AI workloads on consumer hardware.
Participants will leave with a working VoiceAI prototype and a deeper understanding of how open-source tools can be used to build privacy-first, edge-based AI systems.
What Participants Will Build:
During the workshop, participants will build a working offline VoiceAI assistant with the following capabilities:
- Capture spoken audio from a microphone
- Convert speech to text using a local STT pipeline
- Run prompts through a locally hosted LLM
- Convert generated responses to speech
- Play the response back to the user
- Expose the inference engine through an HTTPS endpoint for external clients
The final system will run entirely on the participant’s laptop.
Workshop Agenda
Part 1 — Edge AI and Voice System Architecture
We begin by walking through the architecture of a local VoiceAI system and why edge-based inference is becoming increasingly practical.
Topics covered:
- VoiceAI pipeline design
- Local vs cloud inference tradeoffs
- Overview of the tools used in the workshop
- How the components interact
Participants will also review the architecture diagram that the rest of the workshop builds toward.
Part 2 — Running a Local LLM (60 min)
Participants will install and run a local language model using llama.cpp.
Hands-on steps include:
- Downloading and running a quantized LLM
- Running the inference server from the command line
- Exposing the model through an HTTP API
- Testing prompts using curl and Postman
Participants will learn how to treat the LLM as a service that other components can interact with.
Part 3 — Speech-to-Text Integration
Next, participants will add a local speech-to-text pipeline.
Hands-on exercises include:
- Capturing microphone input
- Converting audio to text
- Sending transcripts to the LLM server
- Inspecting responses
This step establishes the first half of the voice interaction loop.
Break
Part 4 — Text-to-Speech Integration
Participants will add speech synthesis using Piper to generate spoken responses.
Hands-on exercises include:
- Installing and running Piper
- Generating speech from model responses
- Playing synthesized audio locally
- Testing voice responses
At this point, participants will have a complete speech → inference → speech loop running locally.
Part 5 — Exposing the LLM via HTTPS
In this section, participants will add a lightweight reverse proxy in front of the LLM server.
Topics covered:
- Running a local HTTPS endpoint
- Routing requests to the inference server
- Testing with Postman and external tools
- Security considerations for local AI services
Part 6 — Performance Tuning and Extensions
The workshop concludes with practical guidance on improving the system.
Topics include:
- Model size vs latency tradeoffs
- Prompt tuning
- Streaming responses
- Extending the architecture for real applications
Participants will also explore ideas for turning their prototype into a full product or research project.
Learning Outcomes:
By the end of the workshop, participants will be able to:
- Run a local LLM using open-source tools
- Build a modular VoiceAI pipeline
- Integrate speech recognition and speech synthesis locally
- Expose local inference through an HTTPS API
- Evaluate performance tradeoffs when running models on consumer hardware
Participants will also walk away with a working project they can extend after the conference.
Target Audience:
This workshop is designed for:
- Software developers
- Platform engineers
- AI/ML practitioners
- Architects exploring edge AI systems
Participants should have basic familiarity with Python and command-line tools. Prior machine learning experience is not required.
Prerequisites
Participants should bring a laptop capable of running local models.
Recommended environment:
- macOS or Linux (Windows is ok too)
- Python 3.10+
- At least 16GB RAM
- Git and basic CLI familiarity
Setup instructions will be provided before the workshop to ensure attendees can start quickly.
Hussain Abbasi
Founder of intelliAbb
Houston, Texas, United States
Links
Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.
Jump to top