TinyLLMs on the Edge: Running Compressed Language Models on Your Phone

In 2025, AI is breaking free from the cloud. With the rise of model compression, quantization, and optimized runtimes, we can now run compact Large Language Models—known as TinyLLMs—directly on mobile devices, laptops, and even low-power embedded hardware. This shift is changing how we think about AI applications, making them faster, more private, and more accessible to everyone.

In this lightning talk, we’ll explore the exciting new possibilities of running LLMs on the edge. We’ll cover the frameworks and toolchains that make this possible today, including ONNX Runtime Mobile, TensorFlow Lite, and Apple’s MLX, and discuss how developers can deploy sub-300M parameter models for real-world use cases. From offline summarization and chat assistants to real-time text classification and personal productivity tools, TinyLLMs open up use cases that no longer require constant connectivity or expensive cloud infrastructure.

We’ll also look at key challenges such as memory constraints, model quantization, and trade-offs between accuracy and efficiency—and discuss where the future of edge-based AI is heading.

By the end of the session, attendees will walk away with:

A clear understanding of why TinyLLMs matter in 2025,

A practical roadmap for experimenting with on-device AI,

Inspiration to build privacy-first, low-latency applications that fit in the palm of your hand.

If you’ve ever wanted to shrink an LLM to fit in your pocket—this talk is for you.

Prachi Kedar

AI/ML Engineer | Computer Vision & Generative AI Enthusiast

Milan, Italy

Actions

View Speaker Profile

Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.

Session

TinyLLMs on the Edge: Running Compressed Language Models on Your Phone

Prachi Kedar

Links

Actions