ML on K8s: Running AI Workloads with KServe and Kubeflow Lite

Machine learning is increasingly moving from notebooks to production—and Kubernetes is where the action is. However, deploying scale-based models with observability, versioning, and autoscaling can get complex fast.

In this session, we’ll explore how KServe, a CNCF incubating project, simplifies the process of serving ML models on Kubernetes. Using a minimal Kubeflow-lite setup, we’ll walk through a live deployment of an ML model (sci-kit-learn or HuggingFace) and demonstrate production-grade features like autoscaling, traffic splitting, and real-time monitoring.

This talk is aimed at developers, ML engineers, and platform teams looking to operationalize AI workloads without reinventing infrastructure.

What We’ll Cover:
- What is KServe? Where does it fit in the ML + K8s stack?
- How to deploy a lightweight ML model using YAML or CLI
- Autoscaling with KNative integration
- Multi-version model rollout and routing
- Metrics, logs, and basic auth options
- Live traffic simulation to trigger scale-up

Key Takeaways:
1. Understand how KServe simplifies ML inference in Kubernetes
2. Learn how to deploy, scale, and monitor ML endpoints using CNCF tools
3. Gain insight into real-world production patterns for ML models serving
4. Leave with a GitHub repo you can fork to deploy your own models
5. Discover how to bring AI to your K8s cluster in a mini session

Akshay Mittal

Staff Software Engineer | PhD Researcher in Cloud-Native AI/ML | Passionate About Scalable & Intelligent Solutions

Austin, Texas, United States

Actions

View Speaker Profile

Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.

Session

ML on K8s: Running AI Workloads with KServe and Kubeflow Lite

Akshay Mittal

Links

Actions