Session
Serving LLMs at Scale: Understanding the Difference Between Ollama and vLLM
Deploying Large Language Models (LLMs) is becoming a core skill for developers, but most people still rely on “black-box” APIs. In reality, choosing the right serving engine can drastically change latency, throughput, cost, and user experience.
In this session, I’ll take the audience on a practical, developer-oriented journey comparing Ollama and vLLM, the two most popular open-source solutions for running LLMs locally and in production.
We will break down the concepts simply, in Darija, and based on real benchmarks:
+ What we will learn
-How LLM serving actually works (tokenization, batching, memory planning, GPU scheduling)
-Why vLLM is extremely fast: Continuous batching, PagedAttention, optimized sampling
-Why Ollama is extremely easy: UX-first design, model packaging, Docker-like simplicity
-When to choose Ollama (local dev, prototyping, on-device apps, offline assistants)
-When to choose vLLM (APIs, high-throughput apps, RAG, production environments)
-Live architecture comparison: performance, GPU/CPU usage, limits, ecosystem
-Practical deployment demo: serving the same model using both tools
-Real-world lessons from building AI educational tools and agents
+ Why this talk matters
-Most developers in Morocco use LLMs but don’t know:
how model serving really works,
how to reduce latency,
or which tool fits their use case.
This session gives the community actionable, Darija-friendly knowledge, enabling them to build faster, cheaper, more scalable AI apps.
+ Target audience
Beginner → Advanced developers interested in:
-AI & LLMs
-backend & infra
-building AI products
-RAG, agents, inference optimization
By the end, the audience will leave with a clear mental map of how to run LLMs locally and in production, and the confidence to pick the right stack for their next AI project.
Mohannad Tazi
ML/AI Engineer, AI Coach & Researcher --- Building intelligent systems with LLMs, RAG and Autonomous Agents
Casablanca, Morocco
Links
Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.
Jump to top