Wentao Liu

Manager of omfoss.com

Actions

Wentao has more than 20 years of experience in the Linux community. In his earlier Linux activities, he applied LVS technology in producing Layer 4 switches. Now, as an advocate of LLMs and cloud-native, he is focusing on the area of AI agents, LLMs and RAG, AI/ML integration with Kubernetes, especially on MLOps and Speech-to-text application integration. He spoke at several open source conferences, such as FOSSASIA, ApacheCon, and FOSDEM to share his learning experience of open source technologies.

Running AI Agent by Dify and DeepSeek:R1 with vLLM

The talk explores streamlined deployment of scalable AI agents using open-source tools. Attendees will learn how Dify's LLMOps platform simplifies workflow orchestration and monitoring for LLMs, paired with vLLM's high-performance inference engine for cost-efficient, low-latency serving. The session demonstrates integrating these tools to optimize resource utilization, accelerate model iteration, and manage complex AI agent pipelines—from fine-tuning to production. Real-world use cases (e.g., chatbots, automation systems) will highlight best practices for balancing speed, accuracy, and scalability. Developers and ML engineers will gain actionable insights into overcoming GPU constraints, reducing inference costs, and leveraging Kubernetes-native workflows for enterprise-grade LLM operations. Ideal for teams adopting open-source AI/ML stacks, this talk bridges the gap between experimental models and robust, maintainable deployments.

How to deploy Whisper Web on Minikube?

This session dives into deploying the open-source Whisper Web—a browser-based ML speech recognition tool—on Minikube. Attendees will learn to containerize the React/Node.js frontend and PyTorch-backed Whisper model using Docker, optimize images via multi-stage builds (reducing size by 97%), and configure Kubernetes deployments/services for scalability. The demo showcases Minikube cluster setup, GPU-accelerated inference, and handling challenges like proxy configurations and offline image mirroring. Practical takeaways include YAML best practices, horizontal scaling, and leveraging Kubernetes for local development. Ideal for developers exploring GenAI deployment, the session bridges cloud-native principles with real-world AI application workflows, empowering teams to adopt portable, cost-efficient solutions without compromising performance.

Build your first RAG by using Dify LLMOps and Milvus Vector database

Learn how to build a scalable, open-source Retrieval-Augmented Generation (RAG) pipeline using Dify (a no-code LLM app framework) and Milvus (a high-performance vector database). This session will guide developers and AI practitioners through integrating Dify's intuitive interface for prompt engineering and workflow orchestration with Milvus's lightning-fast semantic search capabilities. The speaker will use an example of RAG from Zilliz's online guide. He'll demonstrate how to ingest, index, and retrieve unstructured data efficiently, enabling context-aware AI applications like chatbots, knowledge bases, and analytical tools. Attendees will gain actionable insights into optimizing accuracy, latency, and cost in RAG systems while leveraging open-source tools. A live demo will showcase end-to-end implementation, from dataset preparation to deployment, with reference to best practices outlined in Zilliz's guide. Ideal for developers seeking to harness generative AI without vendor lock-in, this talk bridges the gap between cutting-edge research and real-world deployment, emphasizing modularity, transparency, and community-driven innovation.

How to deploy GenAI applications on Minikube

As the leading role for managing and scaling containerized applications, Kubernetes can greatly ease the process of AI/ML applications deployment. The speaker will discuss the methodology of deploying GenAI applications on Minikube. At the beginning of the talk, the speaker will generally introduce the Hugging Face Whisper web GenAI application, an LLM to handle STT tasks. Then he will discuss the regular Kubernetes deployment method. Techniques including containerizing the Whisper web application, and setting up a Kubernetes cluster will be illustrated. The deployment and service configuration files will be analyzed. He will also introduce the other alternative deployment approach by using KubeAI and Helm, the easiest way to serve ML models in production. Concepts of speech to text API and autoscable will be explained. The architecture of KubeAI will be illustrated. Finally, a live demo of the Whisper application deployed on MiniKube via classical loadbalancer will be demonstrated.

Wentao Liu

Manager of omfoss.com

Actions

Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.

Speaker

Wentao Liu

Actions

Links

Sessions

Running AI Agent by Dify and DeepSeek:R1 with vLLM

How to deploy Whisper Web on Minikube?

Build your first RAG by using Dify LLMOps and Milvus Vector database

How to deploy GenAI applications on Minikube

Wentao Liu

Links

Actions