94% of Your AI Coding Tokens Are Wasted: How We Built a Local RAG That Fixes It

AI coding agents like Claude Code, Cursor, and Copilot re-read entire files every time they need to understand your code. On a medium project, that is 45,000 tokens per query when the agent only needs 4,900. Input tokens are 85-95% of your bill.

We built Code Context Engine, an open-source tool that indexes your codebase locally using tree-sitter AST parsing, stores vector embeddings in sqlite-vec, and serves relevant code chunks through the Model Context Protocol (MCP). The result: 94% token savings, benchmarked on FastAPI with 20 real coding queries.

In this talk, we walk through: tree-sitter AST parsing for semantic code chunks, hybrid retrieval combining vector similarity with BM25 via Reciprocal Rank Fusion, confidence scoring, file diversity filtering, content-hash embedding cache (96% hit rate), secret redaction and PII scrubbing, and cross-session memory using a SQLite-backed knowledge graph with CALLS/IMPORTS edges.

We show live benchmarks, demonstrate MCP integration across Claude Code, Cursor, VS Code, Gemini CLI, and Codex, and share engineering decisions behind choosing sqlite-vec over LanceDB (99% smaller install), truncation compression over LLM summarization, and RRF over learned reranking.

Attendees learn how to build a local RAG for code, measure token savings rigorously, and reduce AI tooling costs. MIT-licensed: github.com/elara-labs/code-context-engine

Rajkumar Sakthivel

AI Systems Engineer | Building LLM Applications and Private Cloud at Scale | International Conference Speaker | Oxford

London, United Kingdom

Actions

View Speaker Profile

Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.

Session

94% of Your AI Coding Tokens Are Wasted: How We Built a Local RAG That Fixes It

Rajkumar Sakthivel

Links

Actions