Intelligent LLM Routing: A New Paradigm for Multi-Model AI Orchestration in Kubernetes

This research-driven talk introduces a novel architecture paradigm that complements recent advances in timely intelligent inference routing for large language models. By integrating proxy-based classification and reranking techniques, we've developed a system that efficiently routes incoming prompts to domain-specialized LLMs based on rapid content analysis. Our approach creates a meta-layer of intelligence above traditional model serving infrastructures, enabling specialized models to handle queries they're optimized for while maintaining a unified API interface. We'll present performance research comparing this distributed approach against monolithic inference-time scaling, demonstrating how intelligent routing can achieve superior results for complex, multi-domain workloads while reducing computational overhead. The session includes a Kubernetes-based reference implementation and quantitative analysis of throughput, latency, and accuracy across diverse prompt categories.

Chen Wang

IBM, Senior Research Scientist

Chappaqua, New York, United States

Actions

View Speaker Profile

Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.

Session

Intelligent LLM Routing: A New Paradigm for Multi-Model AI Orchestration in Kubernetes

Chen Wang

Links

Actions