Inside Google COSMO — The New Architecture of On-Device AI Agents

On May 1, 2026, Google accidentally published a 1.13 GB experimental app called COSMO to the Play Store, then quickly pulled it. Two weeks before Google I/O, this leak revealed the hybrid architecture of next-generation AI assistants.

What Happened

In the early hours of May 1, 2026, Google Research's official Play Store account published a package named com.google.research.air.cosmo. The 1.13 GB download size stood out, and the description read "experimental AI assistant."

Within hours, Google quietly removed the listing — but not before it was downloaded and reverse-analyzed by journalists. The package path research.air reveals it as a next-generation assistant prototype from Google Research's AIR (AI Research) group.

With Google I/O 2026 scheduled for mid-May, this looks like an accidental two-week-early reveal of a planned announcement.

The Core Insight — Three Fulfillment Models

The first thing that catches your eye in COSMO's settings is a user-selectable three-mode processing system:

Mode	Behavior	Use Case
Hybrid	PI server when online, Nano when offline	Default (assumed), general users
PI Only	Always uses server PI model	Quality-first, data sharing OK
Nano Only	Always uses local Gemini Nano	Privacy-first, works offline

What "PI" stands for hasn't been officially defined. Personal Intelligence is the most likely interpretation, referring to server-side Gemini models (likely Gemini 2.x Pro or dedicated infrastructure).

What makes this 3-mode design interesting isn't just "strong vs weak model" — it's that the trade-off itself is exposed to the user.

PI Only    → Quality ↑↑, Privacy ↓,   network-dependent
Nano Only  → Quality ↓,  Privacy ↑↑, offline OK
Hybrid     → Quality ↑,  Privacy ~,   auto fallback

Why This Architecture — Comparing to Apple Intelligence

This pattern isn't entirely new. Apple Intelligence uses similar routing:

On-device: Small tasks run locally
Private Cloud Compute: Larger tasks go to privacy-preserving servers
ChatGPT delegation: Complex queries can be routed externally

But while Apple decides routing automatically, COSMO lets users choose. This is the key philosophical difference.

Aspect	Apple Intelligence	Google COSMO
Mode selection	Automatic (task complexity)	Manual (user choice)
Local model	~3B Apple Foundation Model	Gemini Nano (~3-4B)
Server model	Apple Private Cloud Compute	"PI" (Gemini server)
External delegation	ChatGPT (optional)	None (Google integrated)
Transparency	Mode hidden	3 modes explicitly exposed

Google's approach is developer/power-user friendly: it shows you which data goes to the server, then lets you choose. This aligns well with EU GDPR and tightening global privacy regulations.

14 Skills — Is This Really an "Agent"?

What separates COSMO from a basic chatbot is its 14 pre-defined Skills, which trigger proactively based on user activity.

Productivity Skills

List Tracker — track to-do lists
Document Writer — auto-generate documents
Calendar Event Suggester — propose schedule entries
Add Timer — set timers

Research & Knowledge Skills

Deep Research — in-depth research (likely Gemini's Deep Research feature)
Google it — delegate to search
Jargon Definitions — auto-explain technical terms
Provide Insight — context analysis with insights

Memory & Context Skills

Recall — recall past conversations/activity
Conversation Summary — summarize discussions
People Understanding — learn frequently contacted people
Event Understanding — learn user's event patterns

Visual & Browser Skills

Quick Photo Lookup — gallery search delegation
Browser Agent — web automation (uses Project Mariner)

The fact that Browser Agent uses Project Mariner is decisive. Mariner is Google's browser automation agent, announced in December 2024, which manipulates websites directly through a Chrome extension. COSMO integrates this as one tool inside the on-device agent.

Technical Implementation — AccessibilityService

COSMO uses Android's AccessibilityService API. Originally designed for screen readers and accessibility features, this API is increasingly used by AI agents for screen perception + manipulation.

[User screen] 
    ↓ (AccessibilityService captures)
[Screen context as text]
    ↓
[Skill trigger decision]  ← Nano (cheap) or PI (accurate)
    ↓ (activate appropriate Skill)
[Skill executes → UI manipulation or response]

This is conceptually similar to Apple's App Intents, but applies to all apps, making it more universal. The downside is security — users must grant full screen access, so trust is essential.

What "AIR" Tells Us

The package path com.google.research.air.cosmo contains an interesting clue: AIR. The most likely meaning:

Agentic Intelligence Research
Or a codename for a specific Google Research group

This aligns with Google's Agentic AI emphasis throughout 2025 — signaling that the next-gen AI direction isn't just chatbots, but agents that act on the user's behalf.

Developer Angle — The Rise of Gemini Nano API

Here's where it gets really interesting: the architecture COSMO demonstrates is something developers can build too.

Google's Gemini Nano API provides:

AICore system service — system-level Nano access on Pixel 8+
ML Kit GenAI APIs — higher-level abstractions
Summarization, extraction, rewriting, translation as basic tasks

So anyone can write code like this:

kotlin

// On-device summarization with Gemini Nano
val summarizer = Summarization.getClient(featureOptions)
val result = summarizer.runInference(longText).await()

Combine this with a server-side model, and you get the same hybrid pattern as COSMO. The only difference is "who builds the Skills library."

What This Signals

What the COSMO leak shows:

Hybrid is becoming the standard — the single-model era is over. Local + server routing is the next baseline.
Skills are first-class — not chat interfaces, but bundles of triggerable capabilities are becoming the UX core.
Privacy trade-offs exposed — giving users a choice is the start of trust.
AccessibilityService redefined — screen perception is now the foundation of every mobile agent.

We don't yet know how COSMO will be officially announced at Google I/O 2026. But this architectural pattern is already standardizing.

Apple did it. Google is following. And soon, developers can build the same pattern.

Coming Next — Build It Yourself

In an upcoming post (premium tutorial series), I'll walk through building a 3-mode architecture identical to COSMO using Gemini Nano API + cloud LLM:

AICore setup and first Nano call
Hybrid router design (network-state-aware)
Skills system implementation (Function Calling)
Screen context capture (AccessibilityService)
Privacy mode toggle UI

Before Google's official I/O announcement, getting hands-on with the underlying tech is the best preparation. Subscribe to the newsletter to be notified when it drops.

Conclusion

COSMO was an accident, but not a coincidence. It clearly shows where Google is headed:

Agents (Skills) — chatbots → AI that acts
Hybrid (Nano + PI) — not all-or-nothing, but routing
Transparency (3 modes) — choice instead of automation

What gets announced at Google I/O in mid-May is worth watching. But before that, the best learning comes from building the same idea with your own hands.

Inside Google COSMO — The New Architecture of On-Device AI Agents