Session
94% of Your AI Coding Tokens Are Wasted: How We Built a Local RAG That Fixes It
AI coding agents like Claude Code, Cursor, and Copilot re-read entire files every time they need to understand your code. On a medium project, that is 45,000 tokens per query when the agent only needs 4,900. Input tokens are 85-95% of your bill.
We built Code Context Engine, an open-source tool that indexes your codebase locally using tree-sitter AST parsing, stores vector embeddings in sqlite-vec, and serves relevant code chunks through the Model Context Protocol (MCP). The result: 94% token savings, benchmarked on FastAPI with 20 real coding queries.
In this talk, we walk through: tree-sitter AST parsing for semantic code chunks, hybrid retrieval combining vector similarity with BM25 via Reciprocal Rank Fusion, confidence scoring, file diversity filtering, content-hash embedding cache (96% hit rate), secret redaction and PII scrubbing, and cross-session memory using a SQLite-backed knowledge graph with CALLS/IMPORTS edges.
We show live benchmarks, demonstrate MCP integration across Claude Code, Cursor, VS Code, Gemini CLI, and Codex, and share engineering decisions behind choosing sqlite-vec over LanceDB (99% smaller install), truncation compression over LLM summarization, and RRF over learned reranking.
Attendees learn how to build a local RAG for code, measure token savings rigorously, and reduce AI tooling costs. MIT-licensed: github.com/elara-labs/code-context-engine
Rajkumar Sakthivel
AI Systems Engineer | Building LLM Applications and Private Cloud at Scale | International Conference Speaker | Oxford
London, United Kingdom
Links
Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.
Jump to top