Session
0ms Latency Dreams vs. NPU Reality: Engineering a Production-Ready AI Search with Gem nano
While "Hello World" demos make On-Device AI look effortless, shipping it into a production app like Strollby reveals a different reality. Traditional search filters are predictable, but they lack the nuance of human intent.
In this session, we go behind the scenes of how we replaced rigid UI filters with an AI-driven Search Parser using the ML Kit Prompt API.We will walk through a candid production case study, documenting the journey of optimizing our natural language entity extraction. You’ll see exactly how we slashed inference time from a sluggish 18.3s to a lightning-fast <1s by mastering the "Art of Prompting".
Key Production Insights:
The Prompt Evolution: Compare the "Polite Conversation" approach (18s) vs. the "Structured Constraint" approach (5s).
Zero-Inference Latency: Implementing Prefix Caching to skip static instructions and jump straight to the user query. (Techniques for passing relative time (like "tomorrow") by injecting a dynamic TODAY_DATE context into cached prefixes.)
The Logic Bridge: Using Kotlin to chain calls and map Nano’s JSON output directly to our experience booking modules.
Streaming for Speed: Using Kotlin Flows and generateContentStream to build a responsive UI that feels instantaneous to the user
A Production Reality Check: Addressing the "Fragmented Availability" and "Inconsistent Behavior" across the current Android AI ecosystem.
How we track down the failing conditions and integrated a hybrid architecture as fallback upon user feedback.
Dinoy Raj
Product Engineer – Android @ Strollby | Droidcon Uganda ’25 & Droidcon Abu Dhabi 25 Speaker
Thiruvananthapuram, India
Links
Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.
Jump to top