Why Traces Didn’t Explain Our Search Latency, Until We Changed How We Used OpenTelemetry

Search performance issues are notoriously hard to debug. We assumed that adding OpenTelemetry tracing across our search pipeline would immediately make latency problems obvious. Instead, we ended up with detailed traces that explained very little.

In this session, I’ll share why our initial OpenTelemetry setup failed to help us debug search latency, even though we were “doing everything right.” The real issue wasn’t missing spans—it was missing context. We weren’t capturing the right attributes to explain shard fan-out, query complexity, cache behavior, or ranking stages.

I’ll walk through how we rethought instrumentation for search systems: what not to trace, which semantic attributes actually matter, and how to connect user-facing queries to backend execution paths. Using real examples, I’ll show how a small change in instrumentation exposed retry storms and cache churn that were invisible before—and how fixing those reduced tail latency significantly.

This talk is less about OpenTelemetry basics and more about learning how to use it effectively for complex, distributed search workloads.

Pratik Mahalle

DevRel

Pune, India

Actions

View Speaker Profile

Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.

Session

Why Traces Didn’t Explain Our Search Latency, Until We Changed How We Used OpenTelemetry

Pratik Mahalle

Links

Actions