Most Active Speaker

Abdel Sghiouar

Abdel Sghiouar

Cloud Developer Advocate

Actions

Abdel Sghiouar is a senior Cloud Developer Advocate @Google Cloud. A co-host of the Kubernetes Podcast by Google and a CNCF Ambassador. His focused areas are GKE/Kubernetes, Service Mesh and Serverless. Abdel started his career in datacenters and infrastructure in Morocco where he is originally. Before moving to Google's largest EU datacenter in Belgium. Then in Sweden he joined Google Cloud Professional Services and spent 5 years working with Google Cloud customers on architecting and designing large scale distributed systems before turning to advocacy and community work.

Badges

  • Most Active Speaker 2025
  • Most Active Speaker 2024

Building your AI agent with Agent Development Kit (ADK)

Learn to build a multi-agent system with the Agent Development Kit (ADK) by creating a Travel
Helper agent that automates pre-trip information gathering. You'll learn how to break up a complex task
into specialized sub-agents for tasks like search, weather, and currency conversion. You will then practice
agent orchestration by combining them under a root agent that directs the workflow and integrates
external APIs and tools. You'll learn how to do local testing via the ADK command-line and web interface,
how to deploy the application as a scalable service on Cloud Run, and how to use a local MCP server from
ADK

Large Scale Distributed LLM Inference with LLM-D and Kubernetes

Running Large Language Models (LLMs) locally for experimentation is easy but running them in large scale architectures is not. It requires businesses looking to intergate LLMs into their critical paths to deal with the high costs and scarcity of GPU/TPU accelerators present a significant challenge. Striking the balance between performance, availability, scalability, and cost-efficiency is a must.

While Kubernetes is a ubiquitous runtime for modern workloads, deploying LLM inference effectively demands a specialized approach. Enter LLM-D a Cloud Native Kubernetes based high-performance distributed LLM inference framework. It's architecture centers around a well-lit path for anyone looking to serve at scale, with the fastest time-to-value and competitive performance per dollar, for most models across a diverse and comprehensive set of hardware accelerators.

Pod Right-sizing in the Second Decade of Kubernetes

Historically, optimizing resource allocation for Kubernetes workloads was a painful trial-and-error process, forcing developers to choose between high startup costs or lengthy delays. This constant struggle to find the "just right" balance for pod resources diverted valuable time from feature development.

But with In-Place Pod Resize (IPPR) in Kubernetes, those days are over. IPPR streamlines resource management by allowing you to dynamically resize pods without a restart, opening the door to true right-sizing and vastly improved bin packing.

This talk will explore the benefits of IPPR, demonstrate how to leverage it for optimal resource allocation, and show its integration with Vertical Pod Autoscaler (VPA) to provide startup boosts for your applications.

Optimizing LLM Inference for the Rest of Us

Not every organization operates with the hyperscale resources of Anthropic, Google, or OpenAI. For the majority of businesses integrating Large Language Models (LLMs) into their critical paths, the high costs and scarcity of GPU/TPU accelerators present a significant challenge. Striking the balance between performance, availability, scalability, and cost-efficiency is a must.

While Kubernetes is a ubiquitous runtime for modern workloads, deploying LLM inference effectively demands a specialized approach. This session dives deep into practical strategies for optimizing your Kubernetes clusters and LLM Inference workloads to run efficiently and cost effectively. We will explore:

- Container and Model Optimization
- Accelerator Management
- Data & Storage
- Network & Load Balancing
- Observability

Attendees will leave with practical techniques for maximizing cost/performance for LLM inference for their AI-powered applications on Kubernetes.

Introducing LLM Instance Gateways for Efficient Inference Serving

Large Language Models (LLMs) are revolutionizing applications, but efficiently serving them in production is a challenge. Existing API endpoints, LoadBalancers and Gateways focus on HTTP/gRPC traffic which is a well defined space already. LLM traffic is completely different as an input to an LLM is usually characterized by the size of the prompt, the size and efficiency of the model...etc

Why are LLM Instance Gateways important? They solve the problem of efficiently managing and serving multiple LLM use cases with varying demands on shared infrastructure.

What will you learn? The core challenges of LLM inference serving: Understand the complexities of deploying and managing LLMs in production, including resource allocation, traffic management, and performance optimization.

We will dive into how LLM Instance Gateways work, how they route requests, manage resources, and ensure fairness among different LLM use cases.

Simplify the DevEx With Kubernetes Resource Orchestrator (KRO)

To empower application teams with self-service deployments, platform administrators must hide underlying complexity behind simple, consumable APIs. In the Kubernetes ecosystem, this usually requires building or maintaining Custom Resource Definitions (CRDs). While CRDs allow users to manage both cloud and local resources via YAML, they are often difficult to build and maintain at scale.

Kubernetes Resource Orchestrator (KRO) changes this dynamic. Instead of writing complex controllers from scratch, KRO allows platform teams to easily bundle multiple resources into a single, high-level API.

In this session, you’ll learn how KRO helps you:

- Reduce YAML Fatigue: Create high-level APIs that hide boilerplate while keeping the flexibility developers need.
- Unified Orchestration: Seamlessly manage both native Kubernetes objects and cloud-specific resources (like buckets, topics and IAM) in one place.
- Scale Efficiently: Integrate KRO into existing workflows to simplify platform management and improve developer experience.

Taming Agentic AI: How to run Agents generated Code Safely on Kubernetes

As AI agents increasingly evolve from simple chatbots to autonomous systems capable of generating and executing code and manipulating data, they introduce significant security and operational challenges. This talk explores Agent Sandbox, a Kubernetes-native solution designed to run these non-deterministic and untrusted workloads safely and efficiently.

We will discuss how Agent Sandbox bridges the gap between the safety of virtual machines and the speed of containers. You will learn how it utilizes a dedicated CRD to manage templates and allow Agents to run generated code in an isolated container leveraging gVisor to provide a user-space kernel runtime.

Join us to discover how to scale your AI agents confidently, knowing that even if they go rogue, your cluster remains secure.

KubeCon + CloudNativeCon Europe 2026 Sessionize Event Upcoming

March 2026 Amsterdam, The Netherlands

AI Memory and Founders Night Sessionize Event

February 2026 Berlin, Germany

Abdel Sghiouar

Cloud Developer Advocate

Actions

Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.

Jump to top