Session
KServe and the Next Step for LLM Workloads: LLMInferenceService in Context
Traditional ML and LLM serving share the same needs on Kubernetes: consistent deployment APIs, reliable scaling, and an operational model teams can standardize and automate. This session gives an architecture-first overview of KServe, a Kubernetes-native model serving control plane for classic inference and LLM workloads (a CNCF incubating project since Nov 2025). We then explain why KServe introduced LLMInferenceService (LLM-D integration): to support LLM-focused serving patterns within the same KServe foundation, with clear responsibilities and a Kubernetes-native request flow. You’ll leave with a strong mental model of how KServe and LLMInferenceService fit together and how to approach adoption on real platforms. We close with a short demo installing KServe and deploying a minimal inference service (no benchmarking or tuning).
Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.
Jump to top