Most Active Speaker

Abdel Sghiouar

Abdel Sghiouar

Cloud Developer Advocate

Actions

Abdel Sghiouar is a senior Cloud Developer Advocate @Google Cloud. A co-host of the Kubernetes Podcast by Google and a CNCF Ambassador. His focused areas are GKE/Kubernetes, Service Mesh and Serverless. Abdel started his career in datacenters and infrastructure in Morocco where he is originally. Before moving to Google's largest EU datacenter in Belgium. Then in Sweden he joined Google Cloud Professional Services and spent 5 years working with Google Cloud customers on architecting and designing large scale distributed systems before turning to advocacy and community work.

Awards

  • Most Active Speaker 2024

Hands-on with Ray on Kubernetes

The rapidly evolving landscape of Machine Learning and Large Language Models demands efficient scalable ways to run distributed workloads to train, fine-tune and serve models. Ray is an Open Source framework that simplifies distributed machine learning, and Kubernetes streamlines deployment.
In thise hands-on session we will explore Ray as a framework and how it integrates with Kubernetes to run scalable distributed machine learning workloads. We will cover Ray scalability, patterns for running RayJobs and RayServe and will cover best practices for creating multi-tenant ML platforms using Ray on Kubernetes with fair-sharing of scarce hardware accelerators.we'll uncover how to combine Ray and Kubernetes for your ML projects.

What in the mess is a Service Mesh?

Service Mesh is becoming a key component in the Cloud Native world. It allows dev and Ops teams to connect, secure, and observe applications without mixing business logic with infrastructure concerns. This way, teams can focus on delivering value, letting the mesh do all the complex non-functional work like service discovery, load balancing, encryption, authentication, authorization, support for the circuit breaker pattern, and other capabilities. Istio is one of the major Open Source Service Mesh options available today. In this session, you will gain basic understanding of concepts of Istio.

What’s new in the Kubernetes Gateway API

The Gateway API was Introduced to Kubernetes in 2019. The project is making a sturdy progress toward becoming the single expressive API for Inbound traffic that is portable, extensible and role-oriented. With over 20 Implementations and multiple objects making it to GA recently. This session is about exploring what’s happening in the project. What is the state of the API and the various implementations? We will also cover the GAMMA initiative which aims at using the Gateway API as a standard way to describe East-West traffic (AKA mesh traffic).

Working with Gemma and Open LLMs on Google Kubernetes Engine

The Gemma family of open models can be fine-tuned on your own custom dataset to perform a variety of tasks, such as text generation, translation, and summarization. Combined with Kubernetes, you can unlock the open source AI innovations with scalability, reliability, and ease of management.

In this workshop, you will learn through a guided hands-on exercise how you can work with Gemma and fine-tune it on a Kubernetes cluster. We will also explore options for serving Gemma on Kubernetes with accelerators and Open Source tools.

Introducing LLM Instance Gateways for Efficient Inference Serving

Large Language Models (LLMs) are revolutionizing applications, but efficiently serving them in production is a challenge. Existing API endpoints, LoadBalancers and Gateways focus on HTTP/gRPC traffic which is a well defined space already. LLM traffic is completely different as an input to an LLM is usually characterized by the size of the prompt, the size and efficiency of the model...etc

Why are LLM Instance Gateways important? They solve the problem of efficiently managing and serving multiple LLM use cases with varying demands on shared infrastructure.

What will you learn? The core challenges of LLM inference serving: Understand the complexities of deploying and managing LLMs in production, including resource allocation, traffic management, and performance optimization.

We will dive into how LLM Instance Gateways work, how they route requests, manage resources, and ensure fairness among different LLM use cases.

Bringing GenAI to the Modern Enterprise. A production use-case. In Serverless Java!

Generative AI adoption starts from business needs, not technological aspects.

Enterprises constantly strive for a competitive edge through technology, and LLM solutions offer unique potential. However, this happens only once we clearly understand that our business requirements transcend our current technical capabilities.

Let's roll up our sleeves and learn hands-on how to build, test and deploy cutting-edge, powerful Gen AI applications in the Modern Enterprise, in a Serverless environment, using Java, AI orchestration frameworks and multiple LLMs, with a concrete, real-world production use-case as a backdrop.
The workshop empowers the enterprise Java developer to unlock new, creative possibilities for their Java apps and build features in novel ways.
It caters to the seasoned Java developer in equal measure as to the curious newcomer to GenAI and is crafted as a follow-along workshop.
What are you going to leave this session with:
- a well-balanced, end-to-end, multi-modal RAG application built in Java, ready to run in the cloud and serve as a reference architecture for a modern generative AI enterprise app
- an idempotent solution built in BOTH SpringAI and Langchain4J, today's dominant Java AI orchestration frameworks
- deploy Gen AI apps in Cloud Run, a serverless environment
- use multiple LLMs deployed in
- Managed environments - Google VertexAI
- Local environments - Ollama, Testcontainers
- Kubernetes - vLLM - an optimized LLM serving engine
- full codebase, configuration and deployment instructions

Advanced Ray for distributed ML on Kubernetes

Modern machine learning workloads demand scalable, flexible infrastructure that can handle complex computational requirements. This talk explores how Ray, an open-source unified framework, makes distributed machine learning on Kubernetes easier with its advanced capabilities.

In this talk we will explore Ray Integration with Kubernetes to run scalable distributed machine learning workloads. We will cover Ray scalability, patterns for running RayJobs and RayServe and will cover best practices for creating multi-tenant ML platforms using Ray on Kubernetes with fair-sharing of scarce hardware accelerators.

Yes you can run LLMs on Kubernetes

As LLMs become increasingly powerful and ubiquitous, the need to deploy and scale these models in production environments grows. However, the complexity of LLMs can make them challenging to run reliably and efficiently. In this talk, we'll explore how Kubernetes can be leveraged to run LLMs at scale.

We'll cover the key considerations and best practices for packaging LLM inference services as containerized applications using popular OSS inference servers like TGI, vLLM and Ollama, and deploying them on Kubernetes. This includes managing model weights, handling dynamic batching and scaling, implementing advanced traffic routing, and ensuring high availability and fault tolerance.

Additionally, we'll discuss accelerators management and serving models on multiple hosts. By the end of this talk, attendees will have a comprehensive understanding of how to successfully run their LLMs on Kubernetes, unlocking the benefits of scalability, resilience, and DevOps-friendly deployments.

Introduction to Distributed ML Workloads with Ray on Kubernetes

The rapidly evolving landscape of Machine Learning and Large Language Models demands efficient scalable ways to run distributed workloads to train, fine-tune and serve models. Ray is an Open Source framework that simplifies distributed machine learning, and Kubernetes streamlines deployment. In this introductory talk, we'll uncover how to combine Ray and Kubernetes for your ML projects. You will learn about:
- Basic Ray concepts (actors, tasks) and their relevance to ML
- Setting up a simple Ray cluster within Kubernetes
- Running your first distributed ML training job

Improving Developer productivity with API-first tooling

Even AI is good enough to generate YAML now. However, to secure your job, you will need tools that use more complex general-purpose programming languages.
Jokes aside, we’ll look at a fleet of tools you can use for common configuration and environment setup tasks without touching any Yaml. From creating complex CI pipelines to setting up a local development environment to managing your cloud infrastructure and deployments, lots of essential parts of the cloud-native project setup can all be done in your favorite programming languages. You'll learn how open-source tools abstract container configuration and management (Testcontainers), build tool actions (Dagger), and cloud infrastructure setup (Pulumi).

This session will teach you how to use these tools and give you enough time to start building customized experiences for your application development teams.

End to End ML with Kubernetes, Ray and Java

The rapidly evolving landscape of Machine Learning (ML) and Large Language Models (LLMs) demands efficient, scalable ways to run distributed workloads for training, fine-tuning, serving, and consuming models. Java, the dominant language in enterprise environments, faces pressure to not only modernize application stacks but also to embrace AI, driven by business needs and the myriad possibilities AI offers.
In this context, LangChain4j has emerged as the leading framework for building GenAI, JVM-powered applications. However, the challenge extends beyond simply calling an LLM from a Java application. How does one build an end-to-end platform from data to a working application? This is where Ray and Kubernetes come into play. Ray, an open-source framework, simplifies distributed machine learning, while Kubernetes streamlines deployment.
This deep-dive session will explore how to combine Java, LangChain4j, Ray, and Kubernetes for ML applications.

Distributed Fine Tuning of Open LLMs on Kubernetes

Open LLMs are a family of ML models that can be fine-tuned on your own custom dataset to perform a variety of tasks, such as text generation, translation, and summarization. Combined with Kubernetes, you can unlock the open source AI innovations with scalability, reliability, and ease of management.
In this session, we will deep dive into how you can fine-tune Open LLMs on a Kubernetes cluster. We will also explore options for serving LLms on Kubernetes with accelerators and Open Source tools

Training and Serving LLMs on Kubernetes: A beginner’s guide.

Large Language Models (LLMs) are revolutionizing natural language processing, but their size and complexity can make them challenging to deploy and manage. This talk will provide a beginner-friendly introduction to using Kubernetes for training and serving LLMs.
We'll cover:
The Basics of Kubernetes: This is a quick overview of core Kubernetes concepts (pods, containers, deployments, services) essential for understanding LLM deployment.
LLMs and Resource Demands: This section discusses LLMs' unique computational resource requirements and how Kubernetes helps manage them effectively.
Training LLMs on Kubernetes: Practical guidance on setting up training pipelines, addressing data distribution, and model optimization within a Kubernetes environment.
Serving LLMs for Inference: Walkthroughs of strategies for deploying LLMs as services, load balancing, and scaling to handle real-world traffic.
If you're interested in harnessing the power of LLMs for your projects, this talk will provide a solid foundation for utilizing Kubernetes to streamline your workflow

LLM Observability with OpenTelemetry on Kubernetes

Large Language Models (LLMs) have gained significant prominence due to their diverse applications, ranging from conversational agents to code generation assistants. Given their increasing deployment in production environments, understanding and monitoring LLM behavior has become crucial for effective implementation and risk management.

Observability for LLMs goes beyond monitoring what prompts are sent to the model and what responses are received, it includes also monitoring the application making the call in a distributed system, and considering the wide range of options for using a Large Language Model from using cloud hosted versions to using local open models. Kubernetes is a common platform for deploying the apps and the LLMs

In this session we will explore how OpenTelemetry, the Open Source de facto tool for Logging, Monitoring and Tracing can be used on top of Kubernetes to keep an eye on applications and LLMs behavior. We will explore tracing calls, monitoring prompts and parameters and costs.

Abdel Sghiouar

Cloud Developer Advocate

Actions

Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.

Jump to top