Most of the value is cache, not work.

The reports say Codex $200 is worth ~$8,600/mo and Claude $200 ~$18,900/mo. Here is the proof that "Claude is worth 2× more" is misleading - and what is actually true.

Your tokens today
92.7%
are cache reads, not new work
That are output
0.6%
the only real "work" line
Real output value
≈ tie
$977 vs $911 / mo
The 2× gap
artifact
cache pricing + extrapolation
01 / the symptom

Where your tokens go vs where the dollars go

Measured from your own logs (~/.claude/projects, deduped by request, 2026-05-30). Almost everything is cache. Output - the actual work - is a sliver of the count and a small share of the dollars.

Token count
89.8M tokens today
API-equivalent dollars
$113 today, at Opus list
Cache read Cache write Output (real work) Fresh input

~87% of the dollars are caching overhead; ~12% is output. Caching does not stop the model re-reading the whole context on every call - it just makes the re-read cheap. The token count stays huge because it equals context size × number of calls (~110K re-read × 758 calls today).

02 / the core proof

Both $200 plans, decomposed to the same monthly basis

Each plan's value split into output vs cache, priced at its own provider's API list, scaled to 100% of monthly quota. The output slivers (pink) are nearly identical. The entire headline gap is the cache bars.

Output (real work) Cache re-read ($0.50/1M) First read (input / cache-write)

Codex total $8,589 · Claude total $18,891. Output: $977 vs $911. Input/cache side: $7,612 vs $17,980 (2.4×) - that is where 100% of the gap lives.

03 / the punchline

Strip away cache: the real work is a tie

Output is the least gameable metric - the tokens the model actually generated for you. On a monthly, full-quota basis the two $200 plans are within ~7% of each other.

Output value per month (real work)
$977
Codex $200
$911
Claude $200

In raw output tokens: ~32.6M/mo (Codex) vs ~36.4M/mo (Claude) - a 1.12× edge, nowhere near the 2× the headline implies.

04 / why it looks 2×

Three calculation asymmetries inflate Claude's number

None of these reflect more work delivered. They reflect how the number was built.

Extrapolation multiplier
lower = more reliable

Codex was measured at 99% quota. Claude was one day at 10%, multiplied ×10 - any error amplified 10×.

Price to cache context
first read, per 1M

Anthropic charges 2× for a "cache write" vs OpenAI's normal input. Same behavior, pricier meter.

Context re-reads
per output token

Claude re-reads ~2× more context per unit of output, bloating the cache-read line.

05 / watch the gap melt

Remove the artifacts and $18.9k slides toward $8.6k

Starting from Claude's headline, undo each calculation artifact in turn. The gap to Codex's measured number shrinks at every step.

Re-pricing cache writes at Codex's rate: $18,891 → $15,562. Adjusting the one-day ×10 extrapolation for a plausibly heavy sample day (÷1.5): → ~$12,600. Codex, essentially measured: $8,589. The residual is genuine extra context churn, not extra work.

06 / the honest answer

What is actually true

Which $200 plan does more real work? tieRoughly equal - output ~$977 vs ~$911 / mo.
Which API would bill more for this usage at full quota? Claude - but because it charges more for the same caching and the figure is a ×10 extrapolation, not because you get more done.
Is "$9k vs $20k = Claude worth 2×" a fair comparison? noIt compares a measurement to a one-day ×10 guess, across two pricing models that treat caching differently, on a metric dominated by context overhead.

The deeper trap is the word "value." API-equivalent value measures what the provider's meter would charge, so it rewards verbose context, heavy cache re-reads, and a pricier cache-write rate. A plan can score "2× more valuable" while delivering identical output. That is exactly what happened here.