Serving LLMs at Scale: Understanding the Difference Between Ollama and vLLM

Deploying Large Language Models (LLMs) is becoming a core skill for developers, but most people still rely on “black-box” APIs. In reality, choosing the right serving engine can drastically change latency, throughput, cost, and user experience.

In this session, I’ll take the audience on a practical, developer-oriented journey comparing Ollama and vLLM, the two most popular open-source solutions for running LLMs locally and in production.

We will break down the concepts simply, in Darija, and based on real benchmarks:

+ What we will learn
-How LLM serving actually works (tokenization, batching, memory planning, GPU scheduling)
-Why vLLM is extremely fast: Continuous batching, PagedAttention, optimized sampling
-Why Ollama is extremely easy: UX-first design, model packaging, Docker-like simplicity
-When to choose Ollama (local dev, prototyping, on-device apps, offline assistants)
-When to choose vLLM (APIs, high-throughput apps, RAG, production environments)
-Live architecture comparison: performance, GPU/CPU usage, limits, ecosystem
-Practical deployment demo: serving the same model using both tools
-Real-world lessons from building AI educational tools and agents

+ Why this talk matters

-Most developers in Morocco use LLMs but don’t know:

how model serving really works,
how to reduce latency,
or which tool fits their use case.

This session gives the community actionable, Darija-friendly knowledge, enabling them to build faster, cheaper, more scalable AI apps.

+ Target audience

Beginner → Advanced developers interested in:
-AI & LLMs
-backend & infra
-building AI products
-RAG, agents, inference optimization

By the end, the audience will leave with a clear mental map of how to run LLMs locally and in production, and the confidence to pick the right stack for their next AI project.

Mohannad Tazi

ML/AI Engineer, AI Coach & Researcher --- Building intelligent systems with LLMs, RAG and Autonomous Agents

Casablanca, Morocco

Actions

View Speaker Profile

Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.

Session

Serving LLMs at Scale: Understanding the Difference Between Ollama and vLLM

Mohannad Tazi

Links

Actions