Session

Memory Optimizations for Machine Learning

As Machine Learning continues to forge its way into diverse industries and applications, optimizing computational resources, particularly memory, has become a critical aspect of effective model deployment. This session, "Memory Optimizations for Machine Learning," aims to offer an exhaustive look into the specific memory requirements in Machine Learning tasks, including Large Language Models (LLMs), and the cutting-edge strategies to minimize memory consumption efficiently.

We'll begin by demystifying the memory footprint of typical Machine Learning data structures and algorithms, elucidating the nuances of memory allocation and deallocation during model training phases. The talk will then focus on memory-saving techniques such as data quantization, model pruning, and efficient mini-batch selection. These techniques offer the advantage of conserving memory resources without significant degradation in model performance.
A special emphasis will be placed on the memory footprint of LLMs during inferencing. LLMs, known for their immense size and complexity, pose unique challenges in terms of memory consumption during deployment. We will explore the factors contributing to the memory footprint of LLMs, such as model architecture, input sequence length, and vocabulary size. Additionally, we will discuss practical strategies to optimize memory usage during LLM inferencing, including techniques like model distillation, dynamic memory allocation, and efficient caching mechanisms.
By the end of this session, attendees will have a comprehensive understanding of memory optimization techniques for Machine Learning, with a particular focus on the challenges and solutions related to LLM inferencing.

Tejas Chopra

Senior Software Engineer, Netflix

San Jose, California, United States

Actions

Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.

Jump to top