Claude Sonnet 4.6: Opus-Level Performance, 40% Cheaper — Benchmark Deep Dive
Claude Sonnet 4.6 scores 79.6% on SWE-bench, 72.5% on OSWorld, and 1633 Elo on GDPval-AA — matching or beating Opus 4.6 on production tasks. $3/$15 vs $5/$25 per M tokens. Analysis of Adaptive Thinking, Context Compaction, and OSWorld growth trajectory.

Did Sonnet Just Beat Opus? — Claude Sonnet 4.6 Benchmark Deep Dive
Anthropic released Claude Sonnet 4.6 on February 17, and it outperforms the flagship Opus 4.6 on several key benchmarks. At roughly 40% less cost. The secret isn't a "cheaper knock-off" — it's architectural-level structural changes.
Opus vs Sonnet: What Changed?
The old Opus-Sonnet dynamic was straightforward. Opus was the full-spec brain; Sonnet was the compressed version. Same architecture, smaller size, naturally lower performance.
In the 4.6 generation, that formula breaks.
Related Posts

I Wanted Claude Code Running 24/7 on a Server — So I Built VibeCheck
Close your laptop, Claude Code dies. VibeCheck runs it headlessly on your server so you can access from any browser, anywhere. MIT open source.

I Have Claude Desktop. Why Did I Install NanoClaw?
Claude Desktop is a solo app. If you want AI in your team chat, automated daily briefings, and a codebase you can actually read — NanoClaw.

I Closed My Laptop. The Session Died. That's Not Remote.
Claude Code Remote Control sounds great until you close your laptop. Honest review of what it actually is, Anthropic's cloud alternative, and the third option I built.