Indranil Chandra

Architect ML & Data Engineer @ Upstox

Mumbai, India

Actions

Area of Expertise

Finance & Banking
Health & Medical

Topics

Machine Leaning
Generative AI
Artcial Intelligence
Data Science & AI

LLM Evals: the new CI/CD for GenAI products

Shipping GenAI without evals is like deploying code without tests… exciting for the demo, terrifying in production.

In this talk, I’ll draw from building GenAI-powered products in highly regulated industries, where evals aren’t optional add-ons but gating checks before anything ships. We’ll go beyond academic metrics and dive into what actually matters in production:
• Trust checks: tracing if a RAG answer really comes from the retrieved document.
• Safety checks: catching hallucinations and “confidently wrong” outputs before they reach customers.
• Compliance checks: stress-testing prompts against adversarial queries like “Can I bypass SEBI rules?” or “How do I insider trade?”
• Continuity checks: running nightly synthetic datasets and regression tests to flag drift when embeddings/models/prompts change.
• Governance checks: monitoring prompts like code — with versioning, AB testing, observability, and guardrails against injection, poisoning, or leakage.

The session will be interactive: I’ll demo a lightweight eval harness that continuously probes a live RAG app with adversarial and compliance-sensitive queries. The audience will see in real time how evals flag failure modes that accuracy metrics alone miss.

The novelty is simple: treating evals not as a research afterthought, but as a first-class DevOps layer. By the end, you’ll walk away with practical patterns to:
• Treat prompts as first-class artifacts with CI/CD discipline.
• Embed guardrails and governance hooks alongside evals.
• Graduate GenAI systems from flashy prototypes to reliable, compliant products.

Please Explain, AI!

We’re comfortable when AI gives the right answer. But can we trust why it gave that answer? This talk dives into the crux of LLM adoption - "explainability".

I’ll walk through a practical explainability stack you can actually ship into production:
• Token attribution: IG, SHAP to see which words mattered.
• Attention viz. & prompt tracing: making hidden attention flows visible.
• Logit-lens & neuron probes: surfacing what layers really “know.”
• Counterfactual testing: nudging inputs to test model sensitivity.
• Chain-of-thought: when to trust it, and when it’s a dangerous mirage.

The novelty is simple: explainability as a design layer for real products, not a research toy. This will be an interactive session: we’ll probe a live model (one commercial LLM API vs a small open-source SLM) together and watch its reasoning “light up” across layers. By the end, you’ll leave with practical patterns to embed transparency into GenAI systems, so trust is architected in, not retrofitted later.

See ya SEO…Hello GEO!

Search is dead, long live generation. We’re entering a world where your brand isn’t discovered by Google’s crawlers but by GPT’s responses. With Google starting to roll out AI Overviews, and GPTs replacing keyword search, this shift is happening faster than SEO playbooks can adapt. This talk explores the transition from Search Engine Optimisation (SEO) to Generative Engine Optimisation (GEO), and what that means for how we design content, retrieval, and schemas.

I’ll connect this shift to my earlier writing (From Google to GPT: The Branding Shift for an AI-native World - https://indranildchandra.medium.com/from-google-to-gpt-the-branding-shift-for-an-ai-native-world-454bf26e8c0e), and show how content pipelines now need to be designed not just for ranking but for retrieval into RAG/LLM workflows. That means: writing with embeddings in mind, structuring knowledge for semantic chunking, and designing metadata that LLMs actually use.
We’ll do a live experiment: I’ll pit a traditional SEO-optimised page vs a GEO-optimised knowledge base and trace how differently they surface in GPT responses. The audience will see how small changes in content design can make or break visibility in generation-first platforms.

This isn’t a marketing talk; it’s a systems architect’s take on growth in the GenAI era, where engineering, retrieval, and branding collide. If SEO made or broke companies in Web2, GEO will decide who wins attention in Web3.

Can you "R-A-G" like a pro?

RAG is everyone’s favorite GenAI trick, but running it in production is where the scars show. In this talk, I’ll share my personal journey of scaling Retrieval-Augmented Generation inside enterprises... where “good enough retrieval” often breaks under freshness requirements, attribution demands, and compliance audits.

We’ll unpack the messy but crucial details: choosing embeddings that don’t silently drop context, tracing answers back to their sources (provenance), tuning retrieval freshness vs. latency (TTR), and the tradeoffs between recall and precision. I’ll also showcase two techniques I’ve published about recently: dynamic semantic chunking (to avoid context dilution) and adaptive-k retrieval (a smarter way to balance recall vs precision without extra latency).

The session won’t be a slide monologue. Together with the audience, we’ll stress-test a live RAG system and watch how tweaks in chunking, embeddings, or retrieval settings alter both performance and trust. The big takeaway: RAG isn’t about bolting a vector DB to an LLM, it’s about engineering provenance-aware retrieval pipelines that survive the real world.

Devfest Mumbai 2019 Sessionize Event

September 2019 Mumbai, India

Indranil Chandra

Architect ML & Data Engineer @ Upstox

Mumbai, India

Actions

Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.

Speaker

Indranil Chandra

Actions

Links

Area of Expertise

Topics

Sessions

LLM Evals: the new CI/CD for GenAI products

Please Explain, AI!

See ya SEO…Hello GEO!

Can you "R-A-G" like a pro?

Events

Devfest Mumbai 2019 Sessionize Event

Indranil Chandra

Links

Actions