Session
From Chaos to Clarity: Building an Enterprise-Scale MCP Server for Kubernetes Troubleshooting
How we built an MCP server to power AI-driven Kubernetes troubleshooting at enterprise scale — balancing security, observability, and tool sprawl in real production environments!
As Kubernetes platforms scale management becomes the hardest problem. The Model Context Protocol (MCP) offers a powerful way to connect AI agents to real operational systems.
The server acts as a control plane for AI-driven diagnostics, dynamically invoking cluster APIs, metrics stores, logs, and installing on-demand tools to answer “what’s broken and why?” in real time.
We’ll cover:
- MCP server architecture for K8s troubleshooting, including multi-cluster access, tool isolation, and safe execution patterns
- Hard-earned lessons on tool sprawl — how adding more tools degraded latency and reliability, and how we simplified everything into generic, composable CLI-based tools
- Security evolution: moving from OAuth-based auth to managed identity and enterprise RBAC
Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.
Jump to top