Operating Distributed Inference Systems at Scale

Inference has rapidly become one of the most important infrastructure problems in modern computing. As AI systems evolve into autonomous agents with persistent memory, tool usage, and multi-step reasoning, traditional inference architectures struggle under growing demands for latency, throughput, cost efficiency, and reliability.

In this talk, I’ll share lessons from building large-scale elastic compute and AI infrastructure systems powering production workloads. We’ll explore the modern inference stack and the architectural patterns emerging to support next-generation agentic AI systems.

Topics include:

Distributed inference architectures for large-scale AI systems
GPU scheduling and elastic compute for inference workloads
Multi-tenant inference infrastructure
Caching, batching, and latency optimization strategies
Reliability and fault isolation for inference systems
Observability and control loops for AI serving platforms
Balancing cost, throughput, and user experience
Why inference is becoming an infrastructure orchestration problem

Attendees will gain practical insights into designing scalable, resilient, and cost-efficient inference platforms for modern AI workloads.

Nishant Gupta

Tech Lead, Software Engineering @ Meta SuperIntelligence Lab (MSL) • AI Infrastructure • Distributed Systems • Researcher • Speaker • Startup Advisor

San Francisco, California, United States

Actions

View Speaker Profile

Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.

Session

Operating Distributed Inference Systems at Scale

Nishant Gupta

Links

Actions