The Math Behind LoRA: Fine-Tuning Google Gemma in Action

Large Language Models (LLMs) rely on weight matrices to calculate probabilities and generate next token. Traditionally, fine-tuning these models to specific tasks requires "full fine-tuning" which is a computationally expensive process that demands massive hardware memory to update millions or billions of individual weights.

This presentation demystifies Low-Rank Adaptation (LoRA), a leading Parameter-Efficient Fine-Tuning (PEFT) technique that completely bypasses this hardware bottleneck. We will explore the mathematical "magic" that allows to achieve the exact same probability redistributions as full fine-tuning, while updating only a tiny fraction of the model's weights.

Attendees will be guided through the complete lifecycle of this process. We will map out how data moves through forward passes, loss calculations, and backward passes across multiple training epochs, before finally exploring the inference stage where these newly customized models are efficiently loaded and served. At the end, there is a demonstration of finetuning an LLM while showing the difference in generated output.

Dev J. Shah

SWE @TribalScale, GenAI Evangelist (Blogger, Speaker) || 4x Multi-Cloud Certified || Software Engineering, AI Engineering || Linux, Cloud, DevOps

Toronto, Canada

Actions

View Speaker Profile

Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.

Session

The Math Behind LoRA: Fine-Tuning Google Gemma in Action

Dev J. Shah

Links

Actions