Introducing LLM Instance Gateways for Efficient Inference Serving

Large Language Models (LLMs) are revolutionizing applications, but efficiently serving them in production is a challenge. Existing API endpoints, LoadBalancers and Gateways focus on HTTP/gRPC traffic which is a well defined space already. LLM traffic is completely different as an input to an LLM is usually characterized by the size of the prompt, the size and efficiency of the model...etc

Why are LLM Instance Gateways important? They solve the problem of efficiently managing and serving multiple LLM use cases with varying demands on shared infrastructure.

What will you learn? The core challenges of LLM inference serving: Understand the complexities of deploying and managing LLMs in production, including resource allocation, traffic management, and performance optimization.

We will dive into how LLM Instance Gateways work, how they route requests, manage resources, and ensure fairness among different LLM use cases.

Abdel Sghiouar

Cloud Developer Advocate

Actions

View Speaker Profile

Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.

Session

Introducing LLM Instance Gateways for Efficient Inference Serving

Abdel Sghiouar

Links

Actions