Inside Google COSMO — The New Architecture of On-Device AI Agents
Deep-dive into COSMO, Google's next-gen AI assistant accidentally leaked before I/O 2026. Full breakdown of the 3-mode architecture: Gemini Nano + PI server + Hybrid routing.

Inside Google COSMO — The New Architecture of On-Device AI Agents
On May 1, 2026, Google accidentally published a 1.13 GB experimental app called COSMO to the Play Store, then quickly pulled it. Two weeks before Google I/O, this leak revealed the hybrid architecture of next-generation AI assistants.
What Happened
In the early hours of May 1, 2026, Google Research's official Play Store account published a package named com.google.research.air.cosmo. The 1.13 GB download size stood out, and the description read "experimental AI assistant."
Within hours, Google quietly removed the listing — but not before it was downloaded and reverse-analyzed by journalists. The package path research.air reveals it as a next-generation assistant prototype from Google Research's AIR (AI Research) group.
With Google I/O 2026 scheduled for mid-May, this looks like an accidental two-week-early reveal of a planned announcement.
The Core Insight — Three Fulfillment Models
The first thing that catches your eye in COSMO's settings is a user-selectable three-mode processing system:
| Mode | Behavior | Use Case |
|---|---|---|
| Hybrid | PI server when online, Nano when offline | Default (assumed), general users |
| PI Only | Always uses server PI model | Quality-first, data sharing OK |
| Nano Only | Always uses local Gemini Nano | Privacy-first, works offline |
What "PI" stands for hasn't been officially defined. Personal Intelligence is the most likely interpretation, referring to server-side Gemini models (likely Gemini 2.x Pro or dedicated infrastructure).
What makes this 3-mode design interesting isn't just "strong vs weak model" — it's that the trade-off itself is exposed to the user.
PI Only → Quality ↑↑, Privacy ↓, network-dependent
Nano Only → Quality ↓, Privacy ↑↑, offline OK
Hybrid → Quality ↑, Privacy ~, auto fallbackWhy This Architecture — Comparing to Apple Intelligence
This pattern isn't entirely new. Apple Intelligence uses similar routing:
- On-device: Small tasks run locally
- Private Cloud Compute: Larger tasks go to privacy-preserving servers
- ChatGPT delegation: Complex queries can be routed externally
But while Apple decides routing automatically, COSMO lets users choose. This is the key philosophical difference.
| Aspect | Apple Intelligence | Google COSMO |
|---|---|---|
| Mode selection | Automatic (task complexity) | Manual (user choice) |
| Local model | ~3B Apple Foundation Model | Gemini Nano (~3-4B) |
| Server model | Apple Private Cloud Compute | "PI" (Gemini server) |
| External delegation | ChatGPT (optional) | None (Google integrated) |
| Transparency | Mode hidden | 3 modes explicitly exposed |
Google's approach is developer/power-user friendly: it shows you which data goes to the server, then lets you choose. This aligns well with EU GDPR and tightening global privacy regulations.
14 Skills — Is This Really an "Agent"?
What separates COSMO from a basic chatbot is its 14 pre-defined Skills, which trigger proactively based on user activity.
Productivity Skills
- List Tracker — track to-do lists
- Document Writer — auto-generate documents
- Calendar Event Suggester — propose schedule entries
- Add Timer — set timers
Research & Knowledge Skills
- Deep Research — in-depth research (likely Gemini's Deep Research feature)
- Google it — delegate to search
- Jargon Definitions — auto-explain technical terms
- Provide Insight — context analysis with insights
Memory & Context Skills
- Recall — recall past conversations/activity
- Conversation Summary — summarize discussions
- People Understanding — learn frequently contacted people
- Event Understanding — learn user's event patterns
Visual & Browser Skills
- Quick Photo Lookup — gallery search delegation
- Browser Agent — web automation (uses Project Mariner)
The fact that Browser Agent uses Project Mariner is decisive. Mariner is Google's browser automation agent, announced in December 2024, which manipulates websites directly through a Chrome extension. COSMO integrates this as one tool inside the on-device agent.
Technical Implementation — AccessibilityService
COSMO uses Android's AccessibilityService API. Originally designed for screen readers and accessibility features, this API is increasingly used by AI agents for screen perception + manipulation.
[User screen]
↓ (AccessibilityService captures)
[Screen context as text]
↓
[Skill trigger decision] ← Nano (cheap) or PI (accurate)
↓ (activate appropriate Skill)
[Skill executes → UI manipulation or response]This is conceptually similar to Apple's App Intents, but applies to all apps, making it more universal. The downside is security — users must grant full screen access, so trust is essential.
What "AIR" Tells Us
The package path com.google.research.air.cosmo contains an interesting clue: AIR. The most likely meaning:
- Agentic Intelligence Research
- Or a codename for a specific Google Research group
This aligns with Google's Agentic AI emphasis throughout 2025 — signaling that the next-gen AI direction isn't just chatbots, but agents that act on the user's behalf.
Developer Angle — The Rise of Gemini Nano API
Here's where it gets really interesting: the architecture COSMO demonstrates is something developers can build too.
Google's Gemini Nano API provides:
- AICore system service — system-level Nano access on Pixel 8+
- ML Kit GenAI APIs — higher-level abstractions
- Summarization, extraction, rewriting, translation as basic tasks
So anyone can write code like this:
// On-device summarization with Gemini Nano
val summarizer = Summarization.getClient(featureOptions)
val result = summarizer.runInference(longText).await()Combine this with a server-side model, and you get the same hybrid pattern as COSMO. The only difference is "who builds the Skills library."
What This Signals
What the COSMO leak shows:
- Hybrid is becoming the standard — the single-model era is over. Local + server routing is the next baseline.
- Skills are first-class — not chat interfaces, but bundles of triggerable capabilities are becoming the UX core.
- Privacy trade-offs exposed — giving users a choice is the start of trust.
- AccessibilityService redefined — screen perception is now the foundation of every mobile agent.
We don't yet know how COSMO will be officially announced at Google I/O 2026. But this architectural pattern is already standardizing.
Apple did it. Google is following. And soon, developers can build the same pattern.
Coming Next — Build It Yourself
In an upcoming post (premium tutorial series), I'll walk through building a 3-mode architecture identical to COSMO using Gemini Nano API + cloud LLM:
- AICore setup and first Nano call
- Hybrid router design (network-state-aware)
- Skills system implementation (Function Calling)
- Screen context capture (AccessibilityService)
- Privacy mode toggle UI
Before Google's official I/O announcement, getting hands-on with the underlying tech is the best preparation. Subscribe to the newsletter to be notified when it drops.
Conclusion
COSMO was an accident, but not a coincidence. It clearly shows where Google is headed:
- Agents (Skills) — chatbots → AI that acts
- Hybrid (Nano + PI) — not all-or-nothing, but routing
- Transparency (3 modes) — choice instead of automation
What gets announced at Google I/O in mid-May is worth watching. But before that, the best learning comes from building the same idea with your own hands.
Subscribe to Newsletter
Related Posts

Self-Evolving AI Agents — The New Paradigm of 2026
GenericAgent, Evolver, Open Agents — comparing 3 self-evolving agent frameworks that learn, adapt, and grow without human coding.

Build Your Own LLM Knowledge Base — A Karpathy-Style Knowledge System
Complete guide to building a permanent personal knowledge system with Obsidian + Claude Code. Wiki + Memory dual-axis architecture.

Why Karpathy's CLAUDE.md Got 48K Stars — And How to Write Your Own
One markdown file raised AI coding accuracy from 65% to 94%. Analyzing Karpathy's 4 rules and practical writing guide.