Building an MCP Server That Cuts AI Coding Tokens by 94%: Architecture and Benchmarks

AI coding agents like Claude Code, Cursor, and Copilot re-read entire files every time they need context. On a medium project, that is 45,000 tokens per query when the agent only needs 4,900. Input tokens are 85-95% of your bill.

We built Code Context Engine, an open-source MCP server that indexes your codebase locally using tree-sitter AST parsing and sqlite-vec embeddings. One index serves Claude Code, Cursor, VS Code, Gemini CLI, and Codex simultaneously through the standard MCP protocol. The result: 94% token savings, benchmarked on FastAPI with 20 real queries.

In this talk we cover: tree-sitter AST chunking, hybrid retrieval (vector + BM25 via Reciprocal Rank Fusion), confidence scoring, content-hash embedding cache (96% hit rate), secret redaction, and cross-session memory via a SQLite knowledge graph with CALLS/IMPORTS edges.

We share live benchmarks, MCP integration demos, and engineering tradeoffs: sqlite-vec over LanceDB (99% smaller), truncation over LLM summarization, RRF over learned reranking.

MIT-licensed: github.com/elara-labs/code-context-engine

Rajkumar Sakthivel

AI Systems Engineer | Building LLM Applications and Private Cloud at Scale | International Conference Speaker | Oxford

London, United Kingdom

Actions

View Speaker Profile

Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.

Session

Building an MCP Server That Cuts AI Coding Tokens by 94%: Architecture and Benchmarks

Rajkumar Sakthivel

Links

Actions