Less Compute, More Impact: How Model Quantization Fuels the Next Wave of Agentic AI

The AI industry is shifting from bigger to better. As companies chase efficiency and performance, quantization has emerged as one of the most effective ways to make models smaller, faster, and more affordable—without crippling accuracy. With recent breakthroughs from teams like Deepseek proving that optimization can shake entire markets, developers are rethinking what "efficient AI" really means. The real question isn't whether we can make models smarter... it's whether we can make them smarter per watt, per dollar, and per millisecond.

This session explores the full lifecycle of model quantization and how it powers the rise of Small Language Models (SLMs) and agentic AI systems. We'll cover how quantization works, when it pays off, and how it changes deployment tradeoffs across CPUs, GPUs, and AI accelerators. Attendees will walk away with practical techniques for compressing models, tuning quantization-aware training, and deploying specialized SLMs to leverage them in multi-agent Agentic systems using Agent2Agent protocol. The end goal is to maximize hardware potential while staying responsive without breaking the bank on hardware costs.

David vonThenen

Long Beach, California, United States

Actions

View Speaker Profile

Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.

Session

Less Compute, More Impact: How Model Quantization Fuels the Next Wave of Agentic AI

David vonThenen

Links

Actions