Session

Beyond Dashboards: Operational Reasoning for LLM Inference Systems

Large language model serving systems expose hundreds of metrics, but operators are often left interpreting dashboards manually during incidents, performance regressions, and capacity planning exercises.

This talk introduces KVScope, an open-source observability and diagnostics framework for LLM inference systems built on the PyTorch ecosystem. KVScope attaches to running inference servers, collects runtime telemetry, and converts low-level metrics into operational narratives that explain what is happening, why it is happening, and whats next

Using vLLM on NVIDIA H200 infrastructure as a practical case study, we demonstrate how signals such as KV cache utilization, scheduler backlog, throughput, TTFT, TPOT, and request concurrency can be transformed into higher level operational states including queue pressure, KV cache pressure, saturation, throughput collapse, and recovery.

The session covers metric collection, state modeling, timeline construction, event detection, narrative generation, regression analysis, and forecasting techniques. We also discuss how the architecture extends by an abstraction layer designed for future support of frameworks such as SGLang and TensorRT.

Sai Sravan Cherukuri

Open Source Enthusiasts and DevSecOps Architect

Actions

Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.

Jump to top