Session
Operating Distributed Inference Systems at Scale
Inference has rapidly become one of the most important infrastructure problems in modern computing. As AI systems evolve into autonomous agents with persistent memory, tool usage, and multi-step reasoning, traditional inference architectures struggle under growing demands for latency, throughput, cost efficiency, and reliability.
In this talk, I’ll share lessons from building large-scale elastic compute and AI infrastructure systems powering production workloads. We’ll explore the modern inference stack and the architectural patterns emerging to support next-generation agentic AI systems.
Topics include:
Distributed inference architectures for large-scale AI systems
GPU scheduling and elastic compute for inference workloads
Multi-tenant inference infrastructure
Caching, batching, and latency optimization strategies
Reliability and fault isolation for inference systems
Observability and control loops for AI serving platforms
Balancing cost, throughput, and user experience
Why inference is becoming an infrastructure orchestration problem
Attendees will gain practical insights into designing scalable, resilient, and cost-efficient inference platforms for modern AI workloads.
Nishant Gupta
Tech Lead, Software Engineering @ Meta SuperIntelligence Lab (MSL) • AI Infrastructure • Distributed Systems • Researcher • Speaker • Startup Advisor
San Francisco, California, United States
Links
Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.
Jump to top