Session

Hybrid AI in Flutter — From Cloud to On-Device with Firebase and flutter_gemma

Building AI-powered applications no longer means the cloud is mandatory. Lightweight open models like Gemma, DeepSeek, and Qwen can now run directly on mobile devices and in browsers — no internet required, no per-request costs, and full data privacy guaranteed.

This talk explores on-device AI (Edge AI) as an emerging architectural pattern for Flutter developers. We'll start with Firebase AI Logic for cloud-based inference using Gemini, then move to running models locally on device with flutter_gemma — covering the full setup from model download to GPU-accelerated inference on Android, iOS, Web, and desktop.

But this isn't just about simple text generation. We'll dive into full agent capabilities running entirely on device: on-device function calling that lets the model interact with local APIs — contacts, calendar,sensors — without cloud roundtrips. On-device RAG (Retrieval-Augmented Generation) using EmbeddingGemma for vector embeddings and a local VectorStore, enabling context-aware answers from documents, notes, and emails that never leave the device.

The core of the talk is hybrid AI architecture — knowing when to run on-device, when to use cloud, and how to combine both for the best of each world. We'll implement practical fallback strategies: local-first for privacy and offline capability, cloud-first for power and scale, with automatic switching between the two.
You'll see how a single AIService interface can abstract away the complexity, letting your app seamlessly adapt to network conditions and task requirements.

We'll discuss honest trade-offs along the way: model size vs capability, performance vs privacy, latency vs quality, and when each approach actually makes sense in production. Not every task needs a 100B parameter model in the cloud, and not every task can be handled by a 1B model on device — the art is in knowing where to draw the line.

Expect practical code examples, real architecture decisions, and a working hybrid chat application that combines cloud Gemini with on-device Gemma, complete with streaming responses, embeddings, and RAG — all in pure Dart and Flutter.

Sasha Denisov

EPAM, Chief Software Engineer, AI, Flutter, Dart and Firebase GDE

Berlin, Germany

Actions

Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.

Jump to top