Metrics, Traces, and Tokens: OpenTelemetry for LLM Workloads
LLM services pose a new class of observability problems. Non‑deterministic latency, cost per token, and hallucination or quality metrics that conventional tools often miss.
By wrapping every LLM request in OpenTelemetry traces, metrics, and logs and running those instrumentations in a production Kubernetes environment, users can surface cost and latency signals, flag hallucinations in real time, and optimize resource usage across clusters and applications.
In this talk we'll walk through the SIG‑approved OpenTelemetry semantic conventions for GenAI, demonstrate a ready‑to‑deploy Kubernetes observability stack, and present case studies that show how observability turned an otherwise opaque AI service into a measurable, reliable component of a cloud‑native platform.
Why One-Size-Fits-All MCPs Don’t Work
MCPs work best when they are designed for a specific problem. We tried building a single MCP to serve our entire observability stack, and it failed. Without specific tools, the MCP became too generic for agents to be effective. This talk covers what went wrong and why MCPs should be purpose-built around concrete workflows instead of trying to do everything at once.
OpenTelemetry: Auto-Instrumentation not Blind-Instrumentation
Observability is no longer optional in cloud-native systems but "turn it on and hope" is not a strategy. OpenTelemetry autoinstrumentation promises instant visibility, yet in practice it can quietly introduce cost, performance, and security risks when used without intent.
This talk challenges the idea that more telemetry automatically means better observability. Using a real production example as context, we'll show how autoinstrumentation can surface far more internal behavior than teams expect, creating noise instead of insight. We'll examine what goes wrong, why defaults are dangerous, and how easily good intentions can backfire at scale.
We'll also be clear about the upside: when manual instrumentation isn't feasible, autoinstrumentation is still better than having no telemetry at all. The key is knowing where it helps, where it hurts, and how to use it deliberately.
Lightning: Github Actions on Kubernetes
GitHub Actions workflows for CI/CD with the power of multitenant Kubernetes clusters for enhanced scalability, security and resource utilization.
Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.
Jump to top