Pavneet Ahluwalia

Product @ Microsoft Azure

Toronto, Canada

Actions

Pavneet is a Staff Product manager at Together AI leading IaaS and GPU clusters product for AI native startups. Previously he was a product lead at Azure Kubernetes Service focusing on Scale and Performance. He has 7 years of experience working in containers and cloud platform. Prior to that he worked in Digital marketing, and data science and co-founded 2 startups with successful exits.

Area of Expertise

Business & Management
Information & Communications Technology
Manufacturing & Industrial Materials

Topics

Cloud Native
Cloud & DevOps
Cloud Computing
Cloud & Infrastructure
Cloud Computing on the Azure Platform
Cloud Technology
Cloud Native Infrastructure
Interviewing
Tech Recruiting
Technology
Technology Strategy
Cloud strategy
Kubernetes
Azure Kubernetes Services (AKS)
kubecon
Cloud Native Applications
MCP
Model Context Protocol (MCP)
AI Agents

From Chaos to Clarity: Building an Enterprise-Scale MCP Server for Kubernetes Troubleshooting

How we built an MCP server to power AI-driven Kubernetes troubleshooting at enterprise scale — balancing security, observability, and tool sprawl in real production environments!

As Kubernetes platforms scale management becomes the hardest problem. The Model Context Protocol (MCP) offers a powerful way to connect AI agents to real operational systems.

The server acts as a control plane for AI-driven diagnostics, dynamically invoking cluster APIs, metrics stores, logs, and installing on-demand tools to answer “what’s broken and why?” in real time.

We’ll cover:
- MCP server architecture for K8s troubleshooting, including multi-cluster access, tool isolation, and safe execution patterns
- Hard-earned lessons on tool sprawl — how adding more tools degraded latency and reliability, and how we simplified everything into generic, composable CLI-based tools
- Security evolution: moving from OAuth-based auth to managed identity and enterprise RBAC

Beyond ChatOps: Agentic AI in Kubernetes—What Works, What Breaks, and What’s Next

Agentic AI is evolving from hype to hands-on reality—no longer just copilots, but autonomous actors in Kubernetes clusters. But how effective are these AI agents in real-world ops?

This panel brings together builders and operators who've deployed LLM-powered agents at scale in production to share what worked, what broke, and what surprised them. Expect a candid, high-signal conversation on the true strengths and sharp limitations of AI agents for Kubernetes.

SREs, platform engineers/operators—come with questions, leave with a clearer sense of where AI can reduce toil, when it still needs babysitting(human-in-the-loop), and how to experiment and deploy safely.

We’ll cover:

- High-efficacy use cases: RCA, triage, incident summarization
- Common failure patterns: hallucinations, context loss, unpredictability, alert attention
- Evaluation strategies in dynamic prod environments
- Design trends: agent chaining, feedback loops, safety guardrails

CLI Agent for AKS: AI-Powered Troubleshooting from Your Terminal

The CLI Agent for AKS brings an AI troubleshooting loop directly into az so operators can ask natural-language questions (e.g., “why is my pod Pending?”) and get grounded reasoning, targeted diagnostics, and safe, human-in-the-loop remediation steps—without leaving the terminal. It’s built on open source (HolmesGPT + AKS-MCP), runs locally with your Azure RBAC, and is extensible via runbooks and MCP toolsets. In this demo we will go through how to get started with CLI agent for AKS and AKS MCP server, some complex networking issue troubleshooting using "az aks agent" and share how you can get involved in the community.

Why this matters (3 takeaways)
- Cut MTTR: Turn scattered logs/metrics into concise RCA with actionable next steps.
- Secure by design: Local execution + Azure CLI auth; no cluster changes without explicit approval.
- Composable & open: Plug in your AI provider, observability stack, and MCP tools/runbooks as “lego blocks". Enabling the community to contribute.

Pavneet Ahluwalia

Product @ Microsoft Azure

Toronto, Canada

Actions

Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.

Speaker

Pavneet Ahluwalia

Actions

Links

Area of Expertise

Topics

Sessions

From Chaos to Clarity: Building an Enterprise-Scale MCP Server for Kubernetes Troubleshooting

Beyond ChatOps: Agentic AI in Kubernetes—What Works, What Breaks, and What’s Next

CLI Agent for AKS: AI-Powered Troubleshooting from Your Terminal

Pavneet Ahluwalia

Links

Actions