Session

Memory Wall in AI

AI models are hitting a hard limit—not compute, but memory. As model sizes and context windows grow, memory bandwidth and capacity have become the dominant bottlenecks for training and inference. This talk breaks down why the “Memory Wall” is shaping AI’s future, how today’s systems waste cycles on memory stalls, fragmentation, and data movement, and why optimizations like quantization, pruning, and better KV-cache management only delay the ceiling. We explore real production bottlenecks and propose a memory-first architecture that treats data movement as the primary cost center. Attendees leave with a clear view of where AI systems must evolve next.

Tejas Chopra

Senior Software Engineer, Netflix

San Jose, California, United States

Actions

Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.

Jump to top