Session

How We Cut AI Coding Token Costs by 94% with a Local Code Index and Graph Expansion

AI coding agents like Claude Code and Cursor re-read entire files every session. On a medium project, that is 45,000 tokens per query when the agent only needs 4,900. Input tokens are 85-95% of your AI tooling bill.

We built Code Context Engine (open source, MIT), a local code indexing tool that serves only the relevant code chunks through MCP (Model Context Protocol). The result: 94% token savings, benchmarked on FastAPI with 20 real queries, 0.4ms latency.

In this session with live demos, we walk through:

- Tree-sitter AST parsing for semantic code chunks (functions, classes, modules) across Python, JavaScript, TypeScript, Go, Rust, Java, PHP
- sqlite-vec for vector search, FTS5 for keyword matching, stored in three SQLite files (2 MB vs 217 MB for LanceDB)
- Hybrid retrieval: vector similarity + BM25 via Reciprocal Rank Fusion with confidence scoring
- Code knowledge graph with CALLS/IMPORTS edges for automatic graph expansion
- Content-hash embedding cache achieving 96% hit rate on re-index
- Secret redaction (AWS keys, GitHub tokens, JWTs) and PII scrubbing before indexing
- Cross-session memory: decisions and code areas persist across sessions

Live demo: index a project, ask questions, show before/after token counts, demonstrate savings in real dollars.

Works with Claude Code, VS Code, Cursor, Gemini CLI, and OpenAI Codex. One index, all editors.

github.com/elara-labs/code-context-engine

Rajkumar Sakthivel

AI Systems Engineer | Building LLM Applications and Private Cloud at Scale | International Conference Speaker | Oxford

London, United Kingdom

Actions

Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.

Jump to top