Speaker

Rajkumar Sakthivel

Rajkumar Sakthivel

AI Systems Engineer | Building LLM Applications and Private Cloud at Scale | International Conference Speaker | Oxford

London, United Kingdom

Actions

Rajkumar Sakthivel builds AI systems and LLM-powered applications that run in production at enterprise scale. His work spans the full stack: from training sentence transformer models and integrating large language model APIs, to operating the private cloud infrastructure underneath.

He is the co-creator of Code Context Engine, an open-source tool that cuts AI coding agent token usage by 94% using tree-sitter AST parsing, vector search, and the Model Context Protocol. The project is used across Claude Code, Cursor, VS Code, Gemini CLI, and Codex.

Rajkumar speaks regularly at international conferences across Europe and the United States, including the National DevOps Conference (London), Michigan Technology Conference, Data Saturdays Sofia, and DevOps Oxford. His talks focus on what actually breaks when you ship AI to production, how to operate intelligent systems reliably, and practical approaches to reducing AI tooling costs.

He holds a degree from the University of Oxford and is based in London. When he is not debugging model drift at 2am, he writes about minimalism in DevOps and the engineering side of AI adoption that rarely makes it into product announcements.

Area of Expertise

  • Finance & Banking
  • Information & Communications Technology

Topics

  • PHP
  • DevOps
  • Data Science & AI
  • LLMs
  • LLMOps
  • MLOps
  • AIOps
  • python
  • Artificial Inteligence
  • Artificial Intelligence (AI) and Machine Learning
  • Machine Learning & AI
  • Software Development
  • Observability
  • Monitoring & Observability
  • Software Architecture & Scalability

94% Token Savings for Embedded Teams: Building a Local Code Index with Tree-Sitter and sqlite-vec

AI coding agents like Claude Code, Cursor, and Copilot re-read entire files every time they need to understand your code. On a medium project, that is 45,000 tokens per query when the agent only needs 4,900. Input tokens are 85-95% of your bill.

We built Code Context Engine, an open-source tool that indexes your codebase locally using tree-sitter AST parsing, stores vector embeddings in sqlite-vec, and serves relevant code chunks through the Model Context Protocol (MCP). The result: 94% token savings, benchmarked on FastAPI with 20 real coding queries.

For embedded and systems teams, AI coding costs compound fast: large C++ codebases, monorepos with hardware abstraction layers, and strict toolchain constraints. Code Context Engine runs entirely local (no cloud calls for indexing), works with C, C++, Rust, Python, and Zig via tree-sitter grammars, and integrates with your existing editor through the Model Context Protocol.

In this talk, we walk through: tree-sitter AST parsing for semantic code chunks, hybrid retrieval combining vector similarity with BM25 via Reciprocal Rank Fusion, confidence scoring, file diversity filtering, content-hash embedding cache (96% hit rate), secret redaction and PII scrubbing, and cross-session memory using a SQLite-backed knowledge graph with CALLS/IMPORTS edges.

We show live benchmarks, demonstrate MCP integration across Claude Code, Cursor, VS Code, Gemini CLI, and Codex, and share engineering decisions behind choosing sqlite-vec over LanceDB (99% smaller install), truncation compression over LLM summarization, and RRF over learned reranking.

Attendees learn how to build a local RAG for code, measure token savings rigorously, and reduce AI tooling costs. MIT-licensed: github.com/elara-labs/code-context-engine

AI Agents in Production: What Actually Broke

Everyone's shipping AI features. Few are talking about what happens after.

This is a practitioner's postmortem on building and running real AI-enabled systems end-to-end from training sentence transformer models to integrating GPT-(4,5) APIs to keeping the servers alive. No slides full of architecture diagrams that never see production. Just honest lessons from the trenches.

We'll cover:
- What happens when your OpenAI API key expires mid-request and your app has no fallback
- Model accuracy drift in the wild how to catch it before your users do
- The hidden operational gap between "it works in dev" and "it works at 2am"
- Lessons from bridging Python ML pipelines with PHP production APIs
- What to monitor, what to automate, and what to just accept will break

You'll leave with a practical checklist for hardening AI-integrated systems and a much healthier skepticism of demo-ware.

From Code to Graph: How CALLS/IMPORTS Edges Power 94% Token Savings in AI Coding

AI coding agents re-read entire files every time they need to understand your code. On a medium project, that is 45,000 tokens per query when the agent only needs 4,900. We built Code Context Engine, an open-source tool that uses a code knowledge graph with CALLS and IMPORTS edges to serve only the relevant code chunks.

The graph is central to retrieval quality. When a developer asks about a payment flow, the system finds the top-ranked chunks via hybrid vector and BM25 search, then walks CALLS/IMPORTS edges to pull in related functions from other files automatically. This graph expansion consistently surfaces code the vector search alone would miss.

The pipeline: tree-sitter AST parsing creates semantic chunks (functions, classes, modules). Each chunk becomes a node. Static analysis extracts edges (function calls, imports). Hybrid retrieval combines vector similarity with BM25 keyword matching via Reciprocal Rank Fusion. A confidence scorer blends vector distance, keyword match, and recency. File diversity filtering prevents one large file from dominating results.

Everything is stored in SQLite: sqlite-vec for vector search, FTS5 for keyword search, and a graph table for CALLS/IMPORTS edges. We chose SQLite over a dedicated graph database for simplicity (three files, zero infrastructure, 2 MB install vs 217 MB for LanceDB alone).

Benchmarked on FastAPI (53 files, 20 real queries): 94% token savings, 0.90 Recall@10, 0.4ms query latency. The graph expansion step alone improves recall by surfacing related files that pure vector search misses.

Attendees will learn how to model code as a graph, how graph edges improve retrieval beyond vector similarity, and practical trade-offs between SQLite graph tables and dedicated graph databases for developer tooling.

MIT-licensed: github.com/elara-labs/code-context-engine

Building Intelligent Enterprise Data Applications in Regulated, Large-Scale Environments

Delivering intelligent, data-driven features in enterprise applications is rarely about adopting new tools. Regulation, data ownership, performance constraints, and existing platforms usually drive the architecture. In this session, two practitioners with over 15 years of experience share how they designed and operated intelligent application features on top of established enterprise data platforms in large, regulated environments.

The session covers how SQL Server–based systems, Snowflake, and Databricks were used together to support transactional, analytical, and application workloads. We’ll discuss architectural decisions that worked, those that didn’t, and how governance and operational constraints shaped the final designs. The focus is on real trade-offs rather than idealised architectures, with examples drawn from production systems.

Agentic AI Architecture & Design Patterns

Agentic AI systems go beyond traditional AI by autonomously planning, reasoning, and acting to achieve goals. This session explores the architectural foundations and design patterns that enable reliable, scalable, and controllable agentic AI solutions. Participants will learn how to structure agent workflows, manage memory and tools, orchestrate multi-agent collaboration, and apply proven patterns to real world use cases

How We Cut AI Coding Token Costs by 94% with a Local Code Index and Graph Expansion

AI coding agents like Claude Code and Cursor re-read entire files every session. On a medium project, that is 45,000 tokens per query when the agent only needs 4,900. Input tokens are 85-95% of your AI tooling bill.

We built Code Context Engine (open source, MIT), a local code indexing tool that serves only the relevant code chunks through MCP (Model Context Protocol). The result: 94% token savings, benchmarked on FastAPI with 20 real queries, 0.4ms latency.

In this session with live demos, we walk through:

- Tree-sitter AST parsing for semantic code chunks (functions, classes, modules) across Python, JavaScript, TypeScript, Go, Rust, Java, PHP
- sqlite-vec for vector search, FTS5 for keyword matching, stored in three SQLite files (2 MB vs 217 MB for LanceDB)
- Hybrid retrieval: vector similarity + BM25 via Reciprocal Rank Fusion with confidence scoring
- Code knowledge graph with CALLS/IMPORTS edges for automatic graph expansion
- Content-hash embedding cache achieving 96% hit rate on re-index
- Secret redaction (AWS keys, GitHub tokens, JWTs) and PII scrubbing before indexing
- Cross-session memory: decisions and code areas persist across sessions

Live demo: index a project, ask questions, show before/after token counts, demonstrate savings in real dollars.

Works with Claude Code, VS Code, Cursor, Gemini CLI, and OpenAI Codex. One index, all editors.

github.com/elara-labs/code-context-engine

Mastering AI Agents for Databases

Dive into building your first AI agent with LangChain, where you'll learn to create, run, and optimize AI agents for diverse database scenarios. The session covers CSV and SQL database agents, demonstrating how to read data, customize responses, and enhance user interaction

Practical AI Agents with OpenAI Agents SDK

AI agents addresses the challenge of complex scenario that not only generates text but also grounds its responses in real data and takes action. This session empowers you to leverage retrieval-augmented generation (RAG), knowledge graphs, and agent based architectures to engineer truly intelligent behavior. By combining large language models (LLMs) with up to date information retrieval and structured knowledge, you'll create AI agents capable of deeper reasoning and more reliable problem-solving. Inside, you'll find a practical roadmap from concept to implementation. You’ll discover how to connect language models with external data via RAG pipelines for increasing factual accuracy and incorporate knowledge graphs for context-rich reasoning.

Your AI Reads Entire Files. It Only Needs 6%. Here's How We Fixed It.

AI coding agents like Claude Code, Cursor, and Copilot re-read entire files every time they need to understand your code. On a medium project, that is 45,000 tokens per query when the agent only needs 4,900. Input tokens are 85-95% of your bill.

We built Code Context Engine, an open-source tool that indexes your codebase locally using tree-sitter AST parsing, stores vector embeddings in sqlite-vec, and serves relevant code chunks through the Model Context Protocol (MCP). The result: 94% token savings, benchmarked on FastAPI with 20 real coding queries.

In this talk, we walk through: tree-sitter AST parsing for semantic code chunks, hybrid retrieval combining vector similarity with BM25 via Reciprocal Rank Fusion, confidence scoring, file diversity filtering, content-hash embedding cache (96% hit rate), secret redaction and PII scrubbing, and cross-session memory using a SQLite-backed knowledge graph with CALLS/IMPORTS edges.

We show live benchmarks, demonstrate MCP integration across Claude Code, Cursor, VS Code, Gemini CLI, and Codex, and share engineering decisions behind choosing sqlite-vec over LanceDB (99% smaller install), truncation compression over LLM summarization, and RRF over learned reranking.

Attendees learn how to build a local RAG for code, measure token savings rigorously, and reduce AI tooling costs. MIT-licensed: github.com/elara-labs/code-context-engine

Agentic Architectural Patterns for Building Multi-Agent Systems

Generative AI has moved beyond the hype, and enterprises now face the challenge of turning prototypes into scalable solutions. Starting with a GenAI maturity model, you’ll learn how to assess your organization’s readiness and create a roadmap toward agentic AI adoption. You’ll master foundational topics such as model selection and LLM deployment, progressing to advanced methods such as RAG, fine-tuning, in-context learning, and LLMOps, especially in the context of agentic AI. This session introduces a concrete, hierarchical multi-agent architecture where high level Orchestrator agents manage complex business workflows by delegating entire sub processes to specialized agents. You’ll see how these agents collaborate and communicate using the Agent-to-Agent (A2A) protocol.

What Actually Breaks When You Ship AI to Production

Deploying an LLM is the easy part. Keeping it reliable, accurate, and resilient in production is where teams get burned and most never see it coming.

This session is a practitioner's postmortem from building and operating end-to-end AI-enabled systems using GPT-4 and sentence transformers across Python and PHP stacks, from model training through API integration to server infrastructure.

What we'll cover:
- API key expiry silently taking down live features and how to design fallbacks that actually work
- Model accuracy drift in production: how to detect it before your users do
- The gap between dev behavior and 2am production behavior under real traffic
- Bridging Python ML pipelines with PHP production APIs
- What to monitor, what to automate, and what to just accept will break
- A practical pre-launch checklist for AI-integrated systems

This is real-world experience, not a framework walkthrough. No vendor demos. No slides that only work in theory.

94% of Your AI Coding Tokens Are Wasted: How We Built a Local RAG That Fixes It

AI coding agents like Claude Code, Cursor, and Copilot re-read entire files every time they need to understand your code. On a medium project, that is 45,000 tokens per query when the agent only needs 4,900. Input tokens are 85-95% of your bill.

We built Code Context Engine, an open-source tool that indexes your codebase locally using tree-sitter AST parsing, stores vector embeddings in sqlite-vec, and serves relevant code chunks through the Model Context Protocol (MCP). The result: 94% token savings, benchmarked on FastAPI with 20 real coding queries.

In this talk, we walk through: tree-sitter AST parsing for semantic code chunks, hybrid retrieval combining vector similarity with BM25 via Reciprocal Rank Fusion, confidence scoring, file diversity filtering, content-hash embedding cache (96% hit rate), secret redaction and PII scrubbing, and cross-session memory using a SQLite-backed knowledge graph with CALLS/IMPORTS edges.

We show live benchmarks, demonstrate MCP integration across Claude Code, Cursor, VS Code, Gemini CLI, and Codex, and share engineering decisions behind choosing sqlite-vec over LanceDB (99% smaller install), truncation compression over LLM summarization, and RRF over learned reranking.

Attendees learn how to build a local RAG for code, measure token savings rigorously, and reduce AI tooling costs. MIT-licensed: github.com/elara-labs/code-context-engine

Building an MCP Server That Cuts AI Coding Tokens by 94%: Architecture and Benchmarks

AI coding agents like Claude Code, Cursor, and Copilot re-read entire files every time they need context. On a medium project, that is 45,000 tokens per query when the agent only needs 4,900. Input tokens are 85-95% of your bill.

We built Code Context Engine, an open-source MCP server that indexes your codebase locally using tree-sitter AST parsing and sqlite-vec embeddings. One index serves Claude Code, Cursor, VS Code, Gemini CLI, and Codex simultaneously through the standard MCP protocol. The result: 94% token savings, benchmarked on FastAPI with 20 real queries.

In this talk we cover: tree-sitter AST chunking, hybrid retrieval (vector + BM25 via Reciprocal Rank Fusion), confidence scoring, content-hash embedding cache (96% hit rate), secret redaction, and cross-session memory via a SQLite knowledge graph with CALLS/IMPORTS edges.

We share live benchmarks, MCP integration demos, and engineering tradeoffs: sqlite-vec over LanceDB (99% smaller), truncation over LLM summarization, RRF over learned reranking.

MIT-licensed: github.com/elara-labs/code-context-engine

What Actually Breaks When You Ship AI to Production

Everyone on your team is excited to ship AI features. Nobody talks about what happens the week after.

This is a practitioner's postmortem from building and running end-to-end AI-enabled systems from training sentence transformer models to integrating GPT-(4,5) APIs, across Python and PHP stacks, all the way down to the server. No theory. No vendor demos. Just what broke, why it broke, and what we'd do differently.

What we'll cover:
- OpenAI API key expiry silently taking down live features and how to design fallbacks your whole team can reason about
- Model accuracy drift in production: how to catch it before your users file a bug report
- The gap between "it works in dev" and "it works at 2am under real traffic"
- Bridging Python ML pipelines with PHP production APIs without losing your mind
- What to monitor, what to automate, and what to just accept will occasionally break

This talk is for developers, testers, and anyone involved in shipping software that has an AI component. You don't need an ML background you need to understand what questions to ask before you go live.

LLMOps: Operationalizing Large Language Models for the Real World

Large Language Models (LLMs) like OpenAI GPT,/Groovy Google Gemini, and Databricks DBRX are transforming industries, but effectively deploying and managing them requires more than traditional machine learning practices. LLMOps is a specialized set of techniques, tools, and workflows designed to tackle the unique challenges of working with LLMs in production. This presentation will explore what makes LLMOps distinct, why its essential, and how it enables organizations to harness the power of LLMs efficiently, at scale, and with reduced risks.

Introduction to LLMOps: Overview of its components from data preparation to deployment and monitoring
MLOps to LLMOps: Key differences including computational demands, fine-tuning with human feedback, and prompt engineering
Challenges & Solutions: Addressing LLM-specific issues like inference cost, model drift, and hallucination
Best Practices: Insights into data prep, governance, CI/CD pipelines, and model monitoring
Platform Tools: Exploring platforms like MLflow and Databricks for effective LLMOps implementation

Operationalizing Generative AI: From Concept to Productionizing LLM Solutions with Azure AI Foundry

Take your generative AI projects from experimentation to production with this comprehensive workshop focused on Microsoft's Azure AI Foundry. This hands-on session will guide you through the complete lifecycle of building and deploying enterprise-grade LLM solutions.

We will start with foundational concepts and progress to advanced implementation techniques covering the following:
- End-to-end LLM development workflows in Azure AI Foundry
- Model selection strategies (GPT, Deepseek, Llama) and optimization approaches
- Building production-ready pipelines for fine-tuning and Agentic AI solution implementations
- Deployment architectures for scalable, secure LLM applications

Through interactive labs and real-world case studies, you'll gain practical experience with:
- Azure AI Studio for rapid prototyping and experimentation
- Prompt Flow for building reproducible LLM pipelines
- Model evaluation and continuous monitoring best practices
- Cost optimization and performance tuning techniques
- Implementing responsible AI safeguards and governance controls

Evolving from MLOps to LLMOps - Architectures and Best Practices

This session explore how AI operations are evolving from traditional MLOps to the new world of LLMOps. As large language models transform how we build AI systems, we'll break down what is the different and what stays the same when operationalizing these powerful models.

You will learn practical architectures for managing the complete lifecycle, from data preparation and model training to deployment and monitoring. We'll compare standard MLOps workflows with the new requirements of LLMOps, including prompt management, output validation, and cost optimization for large scale models.

Using real world examples with Databricks and MLflow, we will show how to implement these approaches effectively. Whether you're working with traditional machine learning models or cutting edge LLMs, you'll leave with actionable strategies to streamline your AI operations and deployment pipelines

Challenges and Solutions for ML, LLM, and Agentic Deployments

As enterprises race to adopt AI technologies, they face a complex set of challenges across the lifecycle of machine learning (ML), large language models (LLMs), and agentic systems. This panel brings together experts to explore the current state of AI in the enterprise, highlighting real-world use cases and transformative potential. Panelists will dive into critical issues such as securing autonomous agents, explaining AI behavior to non-technical stakeholders, building infrastructure for scalable LLM deployments, and maintaining ML model performance over time. Whether you’re just starting your AI journey or looking to refine your deployment strategy, this session offers practical insights, emerging best practices, and strategic guidance for navigating the fast-evolving AI landscape.

LLMSecOps – Building Secure and Reliable AI Applications

In this hands-on session, we will dive into the world of LLMSecOps (Large Language Model Security Operations), which focuses on the critical security aspects of building and deploying Large Language Model (LLM) applications. Unlike traditional development, LLM applications face unique security risks that require a systematic approach to address security at every phase— from design through to post-deployment.

Throughout this interactive lab, you will gain practical experience leveraging LLMSecOps principles to build reliable and secure intelligent applications. By the end of the session, you will be equipped with the skills to apply LLMSecOps best practices, helping you secure and enhance the intelligence of your AI applications

Hacking Parenthood: A Software Engineer’s Guide to Raising Future-Ready Kids

Balancing the demands of a software engineering career with the responsibilities of parenthood is no small feat. This talk is designed to provide actionable strategies for raising confident, adaptable, and future-ready children in the midst of a busy life.

Tailored specifically for software engineer families, we'll explore how to optimize daily routines, foster meaningful connections, and integrate work-life balance. Whether you're writing code or guiding your children through their formative years, this session will offer practical tips to help you thrive in both worlds.

LLMOps: Operationalizing Large Language Models for the Real World

Large Language Models (LLMs) like OpenAI GPT,/Groovy Google Gemini, and Databricks DBRX are transforming industries, but effectively deploying and managing them requires more than traditional machine learning practices.

LLMOps is a specialized set of techniques, tools, and workflows designed to tackle the unique challenges of working with LLMs in production.

This presentation will explore what makes LLMOps distinct, why its essential, and how it enables organizations to harness the power of LLMs efficiently, at scale, and with reduced risks.

Shipping AI Inside Laravel From API Call to Production

Most tutorials show you how to call an LLM API. What they don't show you is what happens when that integration is processing 50,000 requests a day in a real enterprise environment, where downtime costs money and "it worked on my machine" isn't good enough.

This workshop is built entirely from production experience. I've spent the last two years embedding Claude and OpenAI into large-scale Laravel applications and I've made enough mistakes to save you from making your own. We'll go beyond the happy path and get into the messy, practical reality of running AI features at scale.

We'll work through how to structure LLM calls properly inside Laravel using queues and jobs, so your app doesn't grind to a halt waiting on a model response. We'll talk about prompt versioning, because models change, and if you're not managing that, your feature will quietly break in ways that are very hard to debug. We'll look at caching strategies that meaningfully cut API costs without sacrificing quality, and we'll cover observability: how to log, monitor, and alert on AI features the same way you would any other critical service.

We'll also spend time on graceful degradation, because the question isn't if an LLM call will fail, it's what your app does when it does.

By the end of the session, you'll have a set of patterns you can take back to your own Laravel codebase and start using straight away. No machine learning background needed, just solid PHP instincts and a willingness to get your hands dirty.

Data Saturdays Sofia 2025 Sessionize Event

October 2025 Sofia, Bulgaria

Michigan Technology Conference 2025 Sessionize Event

March 2025 Pontiac, Michigan, United States

National DevOps Conference 2024

Power of Minimalism in DevOps 2024

October 2024 London, United Kingdom

DevOps Oxford

Minimalism in DevOps 2024

October 2024 Oxford, United Kingdom

PHP Sussex & PHP Oxford

A Software Engineer’s Guide to Raising Future-Ready Kids

September 2024 Brighton, United Kingdom

PHP Stoke

Minimalism in DevOps 2023

August 2023 Stoke-on-Trent, United Kingdom

National DevOps Conference 2021

Are You a Modern DevOps Engineer?

October 2021 London, United Kingdom

London Java Community & PHP Vegas

Modern Software Developer Best Practices 2020

June 2020 London, United Kingdom

Rajkumar Sakthivel

AI Systems Engineer | Building LLM Applications and Private Cloud at Scale | International Conference Speaker | Oxford

London, United Kingdom

Actions

Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.

Jump to top