Session
Memory Wall in AI
AI models are hitting a hard limit—not compute, but memory. As model sizes and context windows grow, memory bandwidth and capacity have become the dominant bottlenecks for training and inference. This talk breaks down why the “Memory Wall” is shaping AI’s future, how today’s systems waste cycles on memory stalls, fragmentation, and data movement, and why optimizations like quantization, pruning, and better KV-cache management only delay the ceiling. We explore real production bottlenecks and propose a memory-first architecture that treats data movement as the primary cost center. Attendees leave with a clear view of where AI systems must evolve next.
Tejas Chopra
Senior Software Engineer, Netflix
San Jose, California, United States
Links
Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.
Jump to top