KServe and the Next Step for LLM Workloads: LLMInferenceService in Context

Traditional ML and LLM serving share the same needs on Kubernetes: consistent deployment APIs, reliable scaling, and an operational model teams can standardize and automate. This session gives an architecture-first overview of KServe, a Kubernetes-native model serving control plane for classic inference and LLM workloads (a CNCF incubating project since Nov 2025). We then explain why KServe introduced LLMInferenceService (LLM-D integration): to support LLM-focused serving patterns within the same KServe foundation, with clear responsibilities and a Kubernetes-native request flow. You’ll leave with a strong mental model of how KServe and LLMInferenceService fit together and how to approach adoption on real platforms. We close with a short demo installing KServe and deploying a minimal inference service (no benchmarking or tuning).

Jooho lee

Red Hat, Principal Software Engineer

Toronto, Canada

Actions

View Speaker Profile

Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.

Session

KServe and the Next Step for LLM Workloads: LLMInferenceService in Context

Jooho lee

Links

Actions