AI Tools & AgentsKR

Inside Google COSMO — The New Architecture of On-Device AI Agents

Deep-dive into COSMO, Google's next-gen AI assistant accidentally leaked before I/O 2026. Full breakdown of the 3-mode architecture: Gemini Nano + PI server + Hybrid routing.

Inside Google COSMO — The New Architecture of On-Device AI Agents

Inside Google COSMO — The New Architecture of On-Device AI Agents

On May 1, 2026, Google accidentally published a 1.13 GB experimental app called COSMO to the Play Store, then quickly pulled it. Two weeks before Google I/O, this leak revealed the hybrid architecture of next-generation AI assistants.

What Happened

In the early hours of May 1, 2026, Google Research's official Play Store account published a package named com.google.research.air.cosmo. The 1.13 GB download size stood out, and the description read "experimental AI assistant."

Within hours, Google quietly removed the listing — but not before it was downloaded and reverse-analyzed by journalists. The package path research.air reveals it as a next-generation assistant prototype from Google Research's AIR (AI Research) group.

With Google I/O 2026 scheduled for mid-May, this looks like an accidental two-week-early reveal of a planned announcement.

The Core Insight — Three Fulfillment Models

The first thing that catches your eye in COSMO's settings is a user-selectable three-mode processing system:

ModeBehaviorUse Case
HybridPI server when online, Nano when offlineDefault (assumed), general users
PI OnlyAlways uses server PI modelQuality-first, data sharing OK
Nano OnlyAlways uses local Gemini NanoPrivacy-first, works offline

What "PI" stands for hasn't been officially defined. Personal Intelligence is the most likely interpretation, referring to server-side Gemini models (likely Gemini 2.x Pro or dedicated infrastructure).

What makes this 3-mode design interesting isn't just "strong vs weak model" — it's that the trade-off itself is exposed to the user.

PI Only    → Quality ↑↑, Privacy ↓,   network-dependent
Nano Only  → Quality ↓,  Privacy ↑↑, offline OK
Hybrid     → Quality ↑,  Privacy ~,   auto fallback

Why This Architecture — Comparing to Apple Intelligence

This pattern isn't entirely new. Apple Intelligence uses similar routing:

  • On-device: Small tasks run locally
  • Private Cloud Compute: Larger tasks go to privacy-preserving servers
  • ChatGPT delegation: Complex queries can be routed externally

But while Apple decides routing automatically, COSMO lets users choose. This is the key philosophical difference.

AspectApple IntelligenceGoogle COSMO
Mode selectionAutomatic (task complexity)Manual (user choice)
Local model~3B Apple Foundation ModelGemini Nano (~3-4B)
Server modelApple Private Cloud Compute"PI" (Gemini server)
External delegationChatGPT (optional)None (Google integrated)
TransparencyMode hidden3 modes explicitly exposed

Google's approach is developer/power-user friendly: it shows you which data goes to the server, then lets you choose. This aligns well with EU GDPR and tightening global privacy regulations.

14 Skills — Is This Really an "Agent"?

What separates COSMO from a basic chatbot is its 14 pre-defined Skills, which trigger proactively based on user activity.

Productivity Skills

  • List Tracker — track to-do lists
  • Document Writer — auto-generate documents
  • Calendar Event Suggester — propose schedule entries
  • Add Timer — set timers

Research & Knowledge Skills

  • Deep Research — in-depth research (likely Gemini's Deep Research feature)
  • Google it — delegate to search
  • Jargon Definitions — auto-explain technical terms
  • Provide Insight — context analysis with insights

Memory & Context Skills

  • Recall — recall past conversations/activity
  • Conversation Summary — summarize discussions
  • People Understanding — learn frequently contacted people
  • Event Understanding — learn user's event patterns

Visual & Browser Skills

  • Quick Photo Lookup — gallery search delegation
  • Browser Agent — web automation (uses Project Mariner)

The fact that Browser Agent uses Project Mariner is decisive. Mariner is Google's browser automation agent, announced in December 2024, which manipulates websites directly through a Chrome extension. COSMO integrates this as one tool inside the on-device agent.

Technical Implementation — AccessibilityService

COSMO uses Android's AccessibilityService API. Originally designed for screen readers and accessibility features, this API is increasingly used by AI agents for screen perception + manipulation.

[User screen] 
    ↓ (AccessibilityService captures)
[Screen context as text]
    ↓
[Skill trigger decision]  ← Nano (cheap) or PI (accurate)
    ↓ (activate appropriate Skill)
[Skill executes → UI manipulation or response]

This is conceptually similar to Apple's App Intents, but applies to all apps, making it more universal. The downside is security — users must grant full screen access, so trust is essential.

What "AIR" Tells Us

The package path com.google.research.air.cosmo contains an interesting clue: AIR. The most likely meaning:

  • Agentic Intelligence Research
  • Or a codename for a specific Google Research group

This aligns with Google's Agentic AI emphasis throughout 2025 — signaling that the next-gen AI direction isn't just chatbots, but agents that act on the user's behalf.

Developer Angle — The Rise of Gemini Nano API

Here's where it gets really interesting: the architecture COSMO demonstrates is something developers can build too.

Google's Gemini Nano API provides:

  • AICore system service — system-level Nano access on Pixel 8+
  • ML Kit GenAI APIs — higher-level abstractions
  • Summarization, extraction, rewriting, translation as basic tasks

So anyone can write code like this:

kotlin
// On-device summarization with Gemini Nano
val summarizer = Summarization.getClient(featureOptions)
val result = summarizer.runInference(longText).await()

Combine this with a server-side model, and you get the same hybrid pattern as COSMO. The only difference is "who builds the Skills library."

What This Signals

What the COSMO leak shows:

  1. Hybrid is becoming the standard — the single-model era is over. Local + server routing is the next baseline.
  2. Skills are first-class — not chat interfaces, but bundles of triggerable capabilities are becoming the UX core.
  3. Privacy trade-offs exposed — giving users a choice is the start of trust.
  4. AccessibilityService redefined — screen perception is now the foundation of every mobile agent.

We don't yet know how COSMO will be officially announced at Google I/O 2026. But this architectural pattern is already standardizing.

Apple did it. Google is following. And soon, developers can build the same pattern.

Coming Next — Build It Yourself

In an upcoming post (premium tutorial series), I'll walk through building a 3-mode architecture identical to COSMO using Gemini Nano API + cloud LLM:

  • AICore setup and first Nano call
  • Hybrid router design (network-state-aware)
  • Skills system implementation (Function Calling)
  • Screen context capture (AccessibilityService)
  • Privacy mode toggle UI

Before Google's official I/O announcement, getting hands-on with the underlying tech is the best preparation. Subscribe to the newsletter to be notified when it drops.

Conclusion

COSMO was an accident, but not a coincidence. It clearly shows where Google is headed:

  • Agents (Skills) — chatbots → AI that acts
  • Hybrid (Nano + PI) — not all-or-nothing, but routing
  • Transparency (3 modes) — choice instead of automation

What gets announced at Google I/O in mid-May is worth watching. But before that, the best learning comes from building the same idea with your own hands.

Stay Updated

Follow us for the latest posts and tutorials

Subscribe to Newsletter

Related Posts