The reports say Codex $200 is worth ~$8,600/mo and Claude $200 ~$18,900/mo. Here is the proof that comparing those two numbers as "Claude is worth 2× more" is misleading - and what is actually true.
Measured from your own logs (~/.claude/projects, deduped by request, 2026-05-30). Almost everything is cache. Output - the actual work - is a sliver of the count and a small share of the dollars.
~87% of the dollars are caching overhead; ~12% is output. Caching does not stop the model re-reading the whole context on every call - it just makes the re-read cheap. The token count stays huge because it equals context size × number of calls (~110K re-read × 758 calls today).
Each plan's value split into output vs cache, priced at its own provider's API list, scaled to 100% of monthly quota. The output slivers (indigo) are nearly identical. The entire headline gap is the gray cache bars.
Codex total $8,589 · Claude total $18,891. Output: $977 vs $911. Input/cache side: $7,612 vs $17,980 (2.4×) - that is where 100% of the gap lives.
Output is the least gameable metric - the tokens the model actually generated for you. On a monthly, full-quota basis the two $200 plans are within ~7% of each other.
In raw output tokens: ~32.6M/mo (Codex) vs ~36.4M/mo (Claude) - a 1.12× edge, nowhere near the 2× the headline implies.
None of these reflect more work delivered. They reflect how the number was built.
Codex was measured at 99% quota. Claude was one day at 10%, multiplied ×10 - any error amplified 10×.
Anthropic charges 2× for a "cache write" vs OpenAI's normal input. Same behavior, pricier meter.
Claude re-reads ~2× more context per unit of output, bloating the cache-read line.
Starting from Claude's headline, undo each calculation artifact in turn. The gap to Codex's measured number shrinks at every step.
Re-pricing cache writes at Codex's rate: $18,891 → $15,562. Adjusting the one-day ×10 extrapolation for a plausibly heavy sample day (÷1.5): → ~$12,600. Codex, essentially measured: $8,589. The residual is genuine extra context churn, not extra work.
| Which $200 plan does more real work? | tieRoughly equal - output ~$977 vs ~$911 / mo. |
| Which API would bill more for this usage at full quota? | Claude - but because it charges more for the same caching and the figure is a ×10 extrapolation, not because you get more done. |
| Is "$9k vs $20k = Claude worth 2×" a fair comparison? | noIt compares a measurement to a one-day ×10 guess, across two pricing models that treat caching differently, on a metric dominated by context overhead. |
The deeper trap is the word "value." API-equivalent value measures what the provider's meter would charge, so it rewards verbose context, heavy cache re-reads, and a pricier cache-write rate. A plan can score "2× more valuable" while delivering identical output. That is exactly what happened here.