Mohannad Tazi

ML/AI Engineer, AI Coach & Researcher --- Building intelligent systems with LLMs, RAG and Autonomous Agents

Casablanca, Morocco

Actions

Mohannad Tazi is an AI/ML Engineer specializing in LLMs, RAG systems, and autonomous AI agents. He builds practical AI solutions for education and productivity, contributes to open-source model evaluations, and regularly speaks at national and international tech events. He has trained and coached 500+ students in AI and Data Science, and focuses on making complex AI concepts accessible to Moroccan developers in Darija.

Area of Expertise

Business & Management
Government, Social Sector & Education
Information & Communications Technology
Manufacturing & Industrial Materials

Topics

Artificial Inteligence
Machine Learning & AI
smart cities
EdTech
Developing Artificial Intelligence Technologies
AI for Startups

Serving LLMs at Scale: Understanding the Difference Between Ollama and vLLM

Deploying Large Language Models (LLMs) is becoming a core skill for developers, but most people still rely on “black-box” APIs. In reality, choosing the right serving engine can drastically change latency, throughput, cost, and user experience.

In this session, I’ll take the audience on a practical, developer-oriented journey comparing Ollama and vLLM, the two most popular open-source solutions for running LLMs locally and in production.

We will break down the concepts simply, in Darija, and based on real benchmarks:

+ What we will learn
-How LLM serving actually works (tokenization, batching, memory planning, GPU scheduling)
-Why vLLM is extremely fast: Continuous batching, PagedAttention, optimized sampling
-Why Ollama is extremely easy: UX-first design, model packaging, Docker-like simplicity
-When to choose Ollama (local dev, prototyping, on-device apps, offline assistants)
-When to choose vLLM (APIs, high-throughput apps, RAG, production environments)
-Live architecture comparison: performance, GPU/CPU usage, limits, ecosystem
-Practical deployment demo: serving the same model using both tools
-Real-world lessons from building AI educational tools and agents

+ Why this talk matters

-Most developers in Morocco use LLMs but don’t know:

how model serving really works,
how to reduce latency,
or which tool fits their use case.

This session gives the community actionable, Darija-friendly knowledge, enabling them to build faster, cheaper, more scalable AI apps.

+ Target audience

Beginner → Advanced developers interested in:
-AI & LLMs
-backend & infra
-building AI products
-RAG, agents, inference optimization

By the end, the audience will leave with a clear mental map of how to run LLMs locally and in production, and the confidence to pick the right stack for their next AI project.

BlablaConf 2026 🇲🇦 Sessionize Event

February 2026

Mohannad Tazi

ML/AI Engineer, AI Coach & Researcher --- Building intelligent systems with LLMs, RAG and Autonomous Agents

Casablanca, Morocco

Actions

Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.

Speaker

Mohannad Tazi

Actions

Links

Area of Expertise

Topics

Sessions

Serving LLMs at Scale: Understanding the Difference Between Ollama and vLLM

Events

BlablaConf 2026 🇲🇦 Sessionize Event

Mohannad Tazi

Links

Actions