Ryan Skidmore works out the tokenomics of Anthropic's prompt cache and lands on a single rule: if you expect to need a cached prefix again within 62.5 minutes, keep refreshing it with cheap reads; past that, let it expire and rewrite, because a 5-minute cache write costs 1.25x base input and a read costs 0.10x, so the ratio collapses to 5 * (1.25 / 0.10) regardless of model or prefix size.
He flags three cache footguns that silently torch the math: Opus 4.7 uses a new tokenizer that can balloon the same text by 35%, prefixes under the model's minimum (4,096 tokens for Opus, 1,024 for Sonnet) don't cache at all and fail silently, and the cache breakpoint only looks back 20 content blocks, so long agent transcripts need an earlier breakpoint to stay hit.
He extends the same model-independent math to /compact, where break-even depends only on the compression ratio: roughly 10:1 pays back in 8 turns, 5:1 in 17, and anything closer to 2:1 is just an expensive summary that also risks dropping the error message your agent will have to rediscover.










