Improve AI inference (serving models) with kServe and vLLM

Red Hat integrates and supports both kServe and vLLM in its MLOps Platform, OpenShift AI. In addition, Red Hat's engineers actively contribute on kServe and vLLM upstream projects everyday.

In this session, we'll talk about:
- brief intro to Red Hat OpenShift AI, describing at high-level its components, all coming from open source projects.
- how KServe fits in OpenShift AI. Benefits of kServe as a model serving platform
- one step further, getting into how choosing vLLM and kServe as the runtime for LLMs and its model serving platform can help
- faster inference and optimized resource consumption with techniques such as Continuous batching, pagedAttention, speculative decoding
- further optimized resource consumption with LLM quantization thanks to vLLM's library LLM Compressor.

Matteo Combi

Specialist Solutions Architect, Application Platform - Red Hat

Milan, Italy

Actions

View Speaker Profile

Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.

Session

Improve AI inference (serving models) with kServe and vLLM

Matteo Combi

Links

Actions