Rajkumar Sakthivel
AI Systems Engineer | Building LLM Applications and Private Cloud at Scale | International Conference Speaker | Oxford
London, United Kingdom
Actions
Rajkumar Sakthivel builds AI systems and LLM-powered applications that run in production at enterprise scale. His work spans the full stack: from training sentence transformer models and integrating large language model APIs, to operating the private cloud infrastructure underneath.
He is the co-creator of Code Context Engine, an open-source tool that cuts AI coding agent token usage by 94% using tree-sitter AST parsing, vector search, and the Model Context Protocol. The project is used across Claude Code, Cursor, VS Code, Gemini CLI, and Codex.
Rajkumar speaks regularly at international conferences across Europe and the United States, including the National DevOps Conference (London), Michigan Technology Conference, Data Saturdays Sofia, and DevOps Oxford. His talks focus on what actually breaks when you ship AI to production, how to operate intelligent systems reliably, and practical approaches to reducing AI tooling costs.
He holds a degree from the University of Oxford and is based in London. When he is not debugging model drift at 2am, he writes about minimalism in DevOps and the engineering side of AI adoption that rarely makes it into product announcements.
Links
Area of Expertise
Topics
94% Token Savings for Embedded Teams: Building a Local Code Index with Tree-Sitter and sqlite-vec
AI coding agents like Claude Code, Cursor, and Copilot re-read entire files every time they need to understand your code. On a medium project, that is 45,000 tokens per query when the agent only needs 4,900. Input tokens are 85-95% of your bill.
We built Code Context Engine, an open-source tool that indexes your codebase locally using tree-sitter AST parsing, stores vector embeddings in sqlite-vec, and serves relevant code chunks through the Model Context Protocol (MCP). The result: 94% token savings, benchmarked on FastAPI with 20 real coding queries.
For embedded and systems teams, AI coding costs compound fast: large C++ codebases, monorepos with hardware abstraction layers, and strict toolchain constraints. Code Context Engine runs entirely local (no cloud calls for indexing), works with C, C++, Rust, Python, and Zig via tree-sitter grammars, and integrates with your existing editor through the Model Context Protocol.
In this talk, we walk through: tree-sitter AST parsing for semantic code chunks, hybrid retrieval combining vector similarity with BM25 via Reciprocal Rank Fusion, confidence scoring, file diversity filtering, content-hash embedding cache (96% hit rate), secret redaction and PII scrubbing, and cross-session memory using a SQLite-backed knowledge graph with CALLS/IMPORTS edges.
We show live benchmarks, demonstrate MCP integration across Claude Code, Cursor, VS Code, Gemini CLI, and Codex, and share engineering decisions behind choosing sqlite-vec over LanceDB (99% smaller install), truncation compression over LLM summarization, and RRF over learned reranking.
Attendees learn how to build a local RAG for code, measure token savings rigorously, and reduce AI tooling costs. MIT-licensed: github.com/elara-labs/code-context-engine
AI Agents in Production: What Actually Broke
Everyone's shipping AI features. Few are talking about what happens after.
This is a practitioner's postmortem on building and running real AI-enabled systems end-to-end from training sentence transformer models to integrating GPT-(4,5) APIs to keeping the servers alive. No slides full of architecture diagrams that never see production. Just honest lessons from the trenches.
We'll cover:
- What happens when your OpenAI API key expires mid-request and your app has no fallback
- Model accuracy drift in the wild how to catch it before your users do
- The hidden operational gap between "it works in dev" and "it works at 2am"
- Lessons from bridging Python ML pipelines with PHP production APIs
- What to monitor, what to automate, and what to just accept will break
You'll leave with a practical checklist for hardening AI-integrated systems and a much healthier skepticism of demo-ware.
From Code to Graph: How CALLS/IMPORTS Edges Power 94% Token Savings in AI Coding
AI coding agents re-read entire files every time they need to understand your code. On a medium project, that is 45,000 tokens per query when the agent only needs 4,900. We built Code Context Engine, an open-source tool that uses a code knowledge graph with CALLS and IMPORTS edges to serve only the relevant code chunks.
The graph is central to retrieval quality. When a developer asks about a payment flow, the system finds the top-ranked chunks via hybrid vector and BM25 search, then walks CALLS/IMPORTS edges to pull in related functions from other files automatically. This graph expansion consistently surfaces code the vector search alone would miss.
The pipeline: tree-sitter AST parsing creates semantic chunks (functions, classes, modules). Each chunk becomes a node. Static analysis extracts edges (function calls, imports). Hybrid retrieval combines vector similarity with BM25 keyword matching via Reciprocal Rank Fusion. A confidence scorer blends vector distance, keyword match, and recency. File diversity filtering prevents one large file from dominating results.
Everything is stored in SQLite: sqlite-vec for vector search, FTS5 for keyword search, and a graph table for CALLS/IMPORTS edges. We chose SQLite over a dedicated graph database for simplicity (three files, zero infrastructure, 2 MB install vs 217 MB for LanceDB alone).
Benchmarked on FastAPI (53 files, 20 real queries): 94% token savings, 0.90 Recall@10, 0.4ms query latency. The graph expansion step alone improves recall by surfacing related files that pure vector search misses.
Attendees will learn how to model code as a graph, how graph edges improve retrieval beyond vector similarity, and practical trade-offs between SQLite graph tables and dedicated graph databases for developer tooling.
MIT-licensed: github.com/elara-labs/code-context-engine
How We Cut AI Coding Token Costs by 94% with a Local Code Index and Graph Expansion
AI coding agents like Claude Code and Cursor re-read entire files every session. On a medium project, that is 45,000 tokens per query when the agent only needs 4,900. Input tokens are 85-95% of your AI tooling bill.
We built Code Context Engine (open source, MIT), a local code indexing tool that serves only the relevant code chunks through MCP (Model Context Protocol). The result: 94% token savings, benchmarked on FastAPI with 20 real queries, 0.4ms latency.
In this session with live demos, we walk through:
- Tree-sitter AST parsing for semantic code chunks (functions, classes, modules) across Python, JavaScript, TypeScript, Go, Rust, Java, PHP
- sqlite-vec for vector search, FTS5 for keyword matching, stored in three SQLite files (2 MB vs 217 MB for LanceDB)
- Hybrid retrieval: vector similarity + BM25 via Reciprocal Rank Fusion with confidence scoring
- Code knowledge graph with CALLS/IMPORTS edges for automatic graph expansion
- Content-hash embedding cache achieving 96% hit rate on re-index
- Secret redaction (AWS keys, GitHub tokens, JWTs) and PII scrubbing before indexing
- Cross-session memory: decisions and code areas persist across sessions
Live demo: index a project, ask questions, show before/after token counts, demonstrate savings in real dollars.
Works with Claude Code, VS Code, Cursor, Gemini CLI, and OpenAI Codex. One index, all editors.
github.com/elara-labs/code-context-engine
Your AI Reads Entire Files. It Only Needs 6%. Here's How We Fixed It.
AI coding agents like Claude Code, Cursor, and Copilot re-read entire files every time they need to understand your code. On a medium project, that is 45,000 tokens per query when the agent only needs 4,900. Input tokens are 85-95% of your bill.
We built Code Context Engine, an open-source tool that indexes your codebase locally using tree-sitter AST parsing, stores vector embeddings in sqlite-vec, and serves relevant code chunks through the Model Context Protocol (MCP). The result: 94% token savings, benchmarked on FastAPI with 20 real coding queries.
In this talk, we walk through: tree-sitter AST parsing for semantic code chunks, hybrid retrieval combining vector similarity with BM25 via Reciprocal Rank Fusion, confidence scoring, file diversity filtering, content-hash embedding cache (96% hit rate), secret redaction and PII scrubbing, and cross-session memory using a SQLite-backed knowledge graph with CALLS/IMPORTS edges.
We show live benchmarks, demonstrate MCP integration across Claude Code, Cursor, VS Code, Gemini CLI, and Codex, and share engineering decisions behind choosing sqlite-vec over LanceDB (99% smaller install), truncation compression over LLM summarization, and RRF over learned reranking.
Attendees learn how to build a local RAG for code, measure token savings rigorously, and reduce AI tooling costs. MIT-licensed: github.com/elara-labs/code-context-engine
What Actually Breaks When You Ship AI to Production
Deploying an LLM is the easy part. Keeping it reliable, accurate, and resilient in production is where teams get burned and most never see it coming.
This session is a practitioner's postmortem from building and operating end-to-end AI-enabled systems using GPT-4 and sentence transformers across Python and PHP stacks, from model training through API integration to server infrastructure.
What we'll cover:
- API key expiry silently taking down live features and how to design fallbacks that actually work
- Model accuracy drift in production: how to detect it before your users do
- The gap between dev behavior and 2am production behavior under real traffic
- Bridging Python ML pipelines with PHP production APIs
- What to monitor, what to automate, and what to just accept will break
- A practical pre-launch checklist for AI-integrated systems
This is real-world experience, not a framework walkthrough. No vendor demos. No slides that only work in theory.
94% of Your AI Coding Tokens Are Wasted: How We Built a Local RAG That Fixes It
AI coding agents like Claude Code, Cursor, and Copilot re-read entire files every time they need to understand your code. On a medium project, that is 45,000 tokens per query when the agent only needs 4,900. Input tokens are 85-95% of your bill.
We built Code Context Engine, an open-source tool that indexes your codebase locally using tree-sitter AST parsing, stores vector embeddings in sqlite-vec, and serves relevant code chunks through the Model Context Protocol (MCP). The result: 94% token savings, benchmarked on FastAPI with 20 real coding queries.
In this talk, we walk through: tree-sitter AST parsing for semantic code chunks, hybrid retrieval combining vector similarity with BM25 via Reciprocal Rank Fusion, confidence scoring, file diversity filtering, content-hash embedding cache (96% hit rate), secret redaction and PII scrubbing, and cross-session memory using a SQLite-backed knowledge graph with CALLS/IMPORTS edges.
We show live benchmarks, demonstrate MCP integration across Claude Code, Cursor, VS Code, Gemini CLI, and Codex, and share engineering decisions behind choosing sqlite-vec over LanceDB (99% smaller install), truncation compression over LLM summarization, and RRF over learned reranking.
Attendees learn how to build a local RAG for code, measure token savings rigorously, and reduce AI tooling costs. MIT-licensed: github.com/elara-labs/code-context-engine
Building an MCP Server That Cuts AI Coding Tokens by 94%: Architecture and Benchmarks
AI coding agents like Claude Code, Cursor, and Copilot re-read entire files every time they need context. On a medium project, that is 45,000 tokens per query when the agent only needs 4,900. Input tokens are 85-95% of your bill.
We built Code Context Engine, an open-source MCP server that indexes your codebase locally using tree-sitter AST parsing and sqlite-vec embeddings. One index serves Claude Code, Cursor, VS Code, Gemini CLI, and Codex simultaneously through the standard MCP protocol. The result: 94% token savings, benchmarked on FastAPI with 20 real queries.
In this talk we cover: tree-sitter AST chunking, hybrid retrieval (vector + BM25 via Reciprocal Rank Fusion), confidence scoring, content-hash embedding cache (96% hit rate), secret redaction, and cross-session memory via a SQLite knowledge graph with CALLS/IMPORTS edges.
We share live benchmarks, MCP integration demos, and engineering tradeoffs: sqlite-vec over LanceDB (99% smaller), truncation over LLM summarization, RRF over learned reranking.
MIT-licensed: github.com/elara-labs/code-context-engine
What Actually Breaks When You Ship AI to Production
Everyone on your team is excited to ship AI features. Nobody talks about what happens the week after.
This is a practitioner's postmortem from building and running end-to-end AI-enabled systems from training sentence transformer models to integrating GPT-(4,5) APIs, across Python and PHP stacks, all the way down to the server. No theory. No vendor demos. Just what broke, why it broke, and what we'd do differently.
What we'll cover:
- OpenAI API key expiry silently taking down live features and how to design fallbacks your whole team can reason about
- Model accuracy drift in production: how to catch it before your users file a bug report
- The gap between "it works in dev" and "it works at 2am under real traffic"
- Bridging Python ML pipelines with PHP production APIs without losing your mind
- What to monitor, what to automate, and what to just accept will occasionally break
This talk is for developers, testers, and anyone involved in shipping software that has an AI component. You don't need an ML background you need to understand what questions to ask before you go live.
LLMOps: Operationalizing Large Language Models for the Real World
Large Language Models (LLMs) like OpenAI GPT,/Groovy Google Gemini, and Databricks DBRX are transforming industries, but effectively deploying and managing them requires more than traditional machine learning practices. LLMOps is a specialized set of techniques, tools, and workflows designed to tackle the unique challenges of working with LLMs in production. This presentation will explore what makes LLMOps distinct, why its essential, and how it enables organizations to harness the power of LLMs efficiently, at scale, and with reduced risks.
Introduction to LLMOps: Overview of its components from data preparation to deployment and monitoring
MLOps to LLMOps: Key differences including computational demands, fine-tuning with human feedback, and prompt engineering
Challenges & Solutions: Addressing LLM-specific issues like inference cost, model drift, and hallucination
Best Practices: Insights into data prep, governance, CI/CD pipelines, and model monitoring
Platform Tools: Exploring platforms like MLflow and Databricks for effective LLMOps implementation
Operationalizing Generative AI: From Concept to Productionizing LLM Solutions with Azure AI Foundry
Take your generative AI projects from experimentation to production with this comprehensive workshop focused on Microsoft's Azure AI Foundry. This hands-on session will guide you through the complete lifecycle of building and deploying enterprise-grade LLM solutions.
We will start with foundational concepts and progress to advanced implementation techniques covering the following:
- End-to-end LLM development workflows in Azure AI Foundry
- Model selection strategies (GPT, Deepseek, Llama) and optimization approaches
- Building production-ready pipelines for fine-tuning and Agentic AI solution implementations
- Deployment architectures for scalable, secure LLM applications
Through interactive labs and real-world case studies, you'll gain practical experience with:
- Azure AI Studio for rapid prototyping and experimentation
- Prompt Flow for building reproducible LLM pipelines
- Model evaluation and continuous monitoring best practices
- Cost optimization and performance tuning techniques
- Implementing responsible AI safeguards and governance controls
Challenges and Solutions for ML, LLM, and Agentic Deployments
As enterprises race to adopt AI technologies, they face a complex set of challenges across the lifecycle of machine learning (ML), large language models (LLMs), and agentic systems. This panel brings together experts to explore the current state of AI in the enterprise, highlighting real-world use cases and transformative potential. Panelists will dive into critical issues such as securing autonomous agents, explaining AI behavior to non-technical stakeholders, building infrastructure for scalable LLM deployments, and maintaining ML model performance over time. Whether you’re just starting your AI journey or looking to refine your deployment strategy, this session offers practical insights, emerging best practices, and strategic guidance for navigating the fast-evolving AI landscape.
Hacking Parenthood: A Software Engineer’s Guide to Raising Future-Ready Kids
Balancing the demands of a software engineering career with the responsibilities of parenthood is no small feat. This talk is designed to provide actionable strategies for raising confident, adaptable, and future-ready children in the midst of a busy life.
Tailored specifically for software engineer families, we'll explore how to optimize daily routines, foster meaningful connections, and integrate work-life balance. Whether you're writing code or guiding your children through their formative years, this session will offer practical tips to help you thrive in both worlds.
LLMOps: Operationalizing Large Language Models for the Real World
Large Language Models (LLMs) like OpenAI GPT,/Groovy Google Gemini, and Databricks DBRX are transforming industries, but effectively deploying and managing them requires more than traditional machine learning practices.
LLMOps is a specialized set of techniques, tools, and workflows designed to tackle the unique challenges of working with LLMs in production.
This presentation will explore what makes LLMOps distinct, why its essential, and how it enables organizations to harness the power of LLMs efficiently, at scale, and with reduced risks.
Data Saturdays Sofia 2025 Sessionize Event
Michigan Technology Conference 2025 Sessionize Event
National DevOps Conference 2024
Power of Minimalism in DevOps 2024
DevOps Oxford
Minimalism in DevOps 2024
PHP Sussex & PHP Oxford
A Software Engineer’s Guide to Raising Future-Ready Kids
PHP Stoke
Minimalism in DevOps 2023
National DevOps Conference 2021
Are You a Modern DevOps Engineer?
London Java Community & PHP Vegas
Modern Software Developer Best Practices 2020
Rajkumar Sakthivel
AI Systems Engineer | Building LLM Applications and Private Cloud at Scale | International Conference Speaker | Oxford
London, United Kingdom
Links
Actions
Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.
Jump to top