Speaker

Dev J. Shah

Dev J. Shah

SWE @TribalScale, GenAI Evangelist (Blogger, Speaker) || 4x Multi-Cloud Certified || Software Engineering, AI Engineering || Linux, Cloud, DevOps

Toronto, Canada

Actions

A detail-oriented Software Engineer and GenAI Evangelist passionate about building scalable, cloud-based applications and AI-powered solutions. With a strong focus on engineering best practices and client collaboration, I deliver tailored solutions that meet unique needs. I channel that same passion into helping developers adopt GenAI technologies through technical blogs, videos, and conference talks.

Area of Expertise

  • Information & Communications Technology

Topics

  • Artificial Inteligence
  • Retrieval Augmented Generation (RAG) and LLM Applications
  • AI Engineering
  • LangChain
  • Ollama
  • Azure AI Engineering
  • Generative AI
  • JavaScript
  • Model Context Protocol (MCP)
  • Using AI and LLMs
  • LlamaIndex
  • Vector Database
  • Cosine Similarity
  • Containers
  • Docker
  • Google Devfest
  • Google Developer Groups

Model Context Protocol Unleashed: Integrating with GitHub Copilot

In this session, we will explore the Model Context Protocol (MCP), an open standard developed by Anthropic to bridge AI models with external tools and data sources. You'll learn how MCP enables seamless integration between large language models and development environments, enhancing the capabilities of AI assistants like GitHub Copilot.​

We will delve into the MCP workflow, demonstrate how to build an MCP server using available SDKs, and showcase the integration with VS Code Copilot. Through live demos and practical examples, we'll highlight real-world use cases where MCP enhances AI-driven development workflows. By the end of this talk, you'll have a clear understanding of how to leverage MCP to create more intelligent and context-aware AI assistants.

The Cosine Similarity Math: Visualizing Embeddings and Retrieval with BigQuery and Gemini

Retrieval-Augmented Generation (RAG) has quickly become one of the most practical approaches for building AI applications, yet many developers treat it as a “black box.” In this talk, I peel back the layers and dive into the math and mechanics that make RAG work.

I begin with a brief overview of RAG and then focus on its two core components: embeddings stored in a vector database and the retrieval of relevant data using cosine similarity. I explain how text is transformed into embeddings, which are arrays of numbers that capture semantic meaning, and compare different methods of generating embeddings, highlighting why vector representations are essential in this context.

Next, I explore cosine similarity as the retrieval mechanism. By plotting embeddings in a 3D graph, I show how measuring the angle (cos θ) between vectors determines relevance.

Finally, I walk through a live coding demo to build a simple RAG pipeline, showing how embeddings are stored in a vector database and retrieved in real time.

The Math Behind LoRA: Fine-Tuning Google Gemma in Action

Large Language Models (LLMs) rely on weight matrices to calculate probabilities and generate next token. Traditionally, fine-tuning these models to specific tasks requires "full fine-tuning" which is a computationally expensive process that demands massive hardware memory to update millions or billions of individual weights.

This presentation demystifies Low-Rank Adaptation (LoRA), a leading Parameter-Efficient Fine-Tuning (PEFT) technique that completely bypasses this hardware bottleneck. We will explore the mathematical "magic" that allows to achieve the exact same probability redistributions as full fine-tuning, while updating only a tiny fraction of the model's weights.

Attendees will be guided through the complete lifecycle of this process. We will map out how data moves through forward passes, loss calculations, and backward passes across multiple training epochs, before finally exploring the inference stage where these newly customized models are efficiently loaded and served. At the end, there is a demonstration of finetuning an LLM while showing the difference in generated output.

Orlando Code Camp 2026 Sessionize Event

April 2026 Sanford, Florida, United States

AgentCon Toronto Sessionize Event

March 2026 Toronto, Canada

Dev J. Shah

SWE @TribalScale, GenAI Evangelist (Blogger, Speaker) || 4x Multi-Cloud Certified || Software Engineering, AI Engineering || Linux, Cloud, DevOps

Toronto, Canada

Actions

Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.

Jump to top