Why Most MCP Tools Fail Silently, And How to Measure It

Most MCP servers don't break because of bugs. They break because the tool descriptions are too vague for agents to reliably pick the right tool.
Two research papers put numbers to this. A SAIL Research study of 856 tools across 103 MCP servers found 97% have at least one quality defect, 56% don't clearly state what the tool does, 89% give no guidance on when not to use it. A second study of 10,831 servers found that well-written descriptions get selected 260% more often, and fixing them raises task success rates by roughly 6 points.
Working with the Glama founder, I helped develop the Tool Definition Quality Score (TDQS) - an open source framework that scores every MCP tool across six dimensions: Purpose Clarity, Usage Guidelines, Behavioral Transparency, Parameter Semantics, Conciseness, and Contextual Completeness. Each tool gets a 1–5 per dimension with specific feedback on what's missing and why it matters.
This talk covers how TDQS was built, what scoring thousands of real servers revealed, and how server authors can use it to ship tools agents actually invoke correctly. The framework is open source and already live across Glama-hosted servers.

Om Shree

Founder, Shreesozo | MCP & Agentic AI Content Studio

Amritsar, India

Actions

View Speaker Profile

Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.

Session

Why Most MCP Tools Fail Silently, And How to Measure It

Om Shree

Links

Actions