Session

Bringing Generative Art and LLMs to the Edge

This talk will focus on the challenges of running large models on different system architectures and the need to optimize the models themselves to run efficiently on the edge. We will discuss how to optimize models to reduce the model size and computational complexity, and how to use different techniques to improve the inference time. Additionally, we will also explore the implications of running such models on edge devices, including issues such as memory and bandwidth constraints, and discuss how to optimize these models to achieve the best performance across different system architectures. We not only show how all of the aforementioned tasks could be done with Kubernetes on the edge to deploy your ML models in an optimal way but also show how one can make the best use of edge hardware accelerators like GPUs or TPUs and show use of technologies like WebAssembly to support model deployments on wide range of edge architectures.

Shivay Lamba

Developer Relations

New Delhi, India

Actions

Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.

Jump to top