Most Active Speaker

Abdel Sghiouar

Abdel Sghiouar

Cloud Developer Advocate

Actions

Abdel Sghiouar is a senior Cloud Developer Advocate @Google Cloud. A co-host of the Kubernetes Podcast by Google and a CNCF Ambassador. His focused areas are GKE/Kubernetes, Service Mesh and Serverless. Abdel started his career in datacenters and infrastructure in Morocco where he is originally. Before moving to Google's largest EU datacenter in Belgium. Then in Sweden he joined Google Cloud Professional Services and spent 5 years working with Google Cloud customers on architecting and designing large scale distributed systems before turning to advocacy and community work.

Badges

  • Most Active Speaker 2025
  • Most Active Speaker 2024

Building your AI agent with Agent Development Kit (ADK)

Learn to build a multi-agent system with the Agent Development Kit (ADK) by creating a Travel
Helper agent that automates pre-trip information gathering. You'll learn how to break up a complex task
into specialized sub-agents for tasks like search, weather, and currency conversion. You will then practice
agent orchestration by combining them under a root agent that directs the workflow and integrates
external APIs and tools. You'll learn how to do local testing via the ADK command-line and web interface,
how to deploy the application as a scalable service on Cloud Run, and how to use a local MCP server from
ADK

Large Scale Distributed LLM Inference with LLM-D and Kubernetes

Running Large Language Models (LLMs) locally for experimentation is easy but running them in large scale architectures is not. It requires businesses looking to intergate LLMs into their critical paths to deal with the high costs and scarcity of GPU/TPU accelerators present a significant challenge. Striking the balance between performance, availability, scalability, and cost-efficiency is a must.

While Kubernetes is a ubiquitous runtime for modern workloads, deploying LLM inference effectively demands a specialized approach. Enter LLM-D a Cloud Native Kubernetes based high-performance distributed LLM inference framework. It's architecture centers around a well-lit path for anyone looking to serve at scale, with the fastest time-to-value and competitive performance per dollar, for most models across a diverse and comprehensive set of hardware accelerators.

Pod Right-sizing in the Second Decade of Kubernetes

Historically, optimizing resource allocation for Kubernetes workloads was a painful trial-and-error process, forcing developers to choose between high startup costs or lengthy delays. This constant struggle to find the "just right" balance for pod resources diverted valuable time from feature development.

But with In-Place Pod Resize (IPPR) in Kubernetes, those days are over. IPPR streamlines resource management by allowing you to dynamically resize pods without a restart, opening the door to true right-sizing and vastly improved bin packing.

This talk will explore the benefits of IPPR, demonstrate how to leverage it for optimal resource allocation, and show its integration with Vertical Pod Autoscaler (VPA) to provide startup boosts for your applications.

Optimizing LLM Inference for the Rest of Us

Not every organization operates with the hyperscale resources of Anthropic, Google, or OpenAI. For the majority of businesses integrating Large Language Models (LLMs) into their critical paths, the high costs and scarcity of GPU/TPU accelerators present a significant challenge. Striking the balance between performance, availability, scalability, and cost-efficiency is a must.

While Kubernetes is a ubiquitous runtime for modern workloads, deploying LLM inference effectively demands a specialized approach. This session dives deep into practical strategies for optimizing your Kubernetes clusters and LLM Inference workloads to run efficiently and cost effectively. We will explore:

- Container and Model Optimization
- Accelerator Management
- Data & Storage
- Network & Load Balancing
- Observability

Attendees will leave with practical techniques for maximizing cost/performance for LLM inference for their AI-powered applications on Kubernetes.

Introducing Kubernetes Resource Orchestrator (KRO)

Providing application teams with a self-service way of deploying applications and their dependencies often means that platform administrators have to hide the implementation details of the platform via simple to consume APIs. In the case of Kubernetes this usually means having to deploy Custom Resource Definition that are either 3rd party or custom built in-house. These CRD’s in addition to allowing Kubernetes users to manage non-Kubernetes objects via the Kubernetes Resources Model (Aka YAML file), they also allow abstracting away the details of how some resources get created and managed.

In this session, you'll learn how KRO allows platform teams to:
- Create high-level APIs that reduce YAML complexity while maintaining flexibility
- Support both native Kubernetes and cloud-specific resources for more efficient orchestration
- Integrate KRO into your Kubernetes workflows for better scalability and simplicity

Introducing LLM Instance Gateways for Efficient Inference Serving

Large Language Models (LLMs) are revolutionizing applications, but efficiently serving them in production is a challenge. Existing API endpoints, LoadBalancers and Gateways focus on HTTP/gRPC traffic which is a well defined space already. LLM traffic is completely different as an input to an LLM is usually characterized by the size of the prompt, the size and efficiency of the model...etc

Why are LLM Instance Gateways important? They solve the problem of efficiently managing and serving multiple LLM use cases with varying demands on shared infrastructure.

What will you learn? The core challenges of LLM inference serving: Understand the complexities of deploying and managing LLMs in production, including resource allocation, traffic management, and performance optimization.

We will dive into how LLM Instance Gateways work, how they route requests, manage resources, and ensure fairness among different LLM use cases.

Taming Agentic AI: How to run Agents generated Code Safely on Kubernetes

As AI agents increasingly evolve from simple chatbots to autonomous systems capable of generating and executing code and manipulating data, they introduce significant security and operational challenges. This talk explores Agent Sandbox, a Kubernetes-native solution designed to run these non-deterministic and untrusted workloads safely and efficiently.

We will discuss how Agent Sandbox bridges the gap between the safety of virtual machines and the speed of containers. You will learn how it utilizes a dedicated CRD to manage templates and allow Agents to run generated code in an isolated container leveraging gVisor to provide a user-space kernel runtime.

Join us to discover how to scale your AI agents confidently, knowing that even if they go rogue, your cluster remains secure.

KubeCon + CloudNativeCon Europe 2026 Sessionize Event Upcoming

March 2026 Amsterdam, The Netherlands

AI Memory and Founders Night Sessionize Event Upcoming

February 2026 Berlin, Germany

Abdel Sghiouar

Cloud Developer Advocate

Actions

Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.

Jump to top