Fable 5 costs twice as much as Opus. Except two is the wrong number — in both directions

Fable 5 costs twice as much as Opus. Except two is the wrong number — in both directions

Anthropic dropped Claude Fable 5 at $10/$50 and half of Twitter closed the budget debate with "2x more expensive, I'll wait." But a new tokenizer, always-on thinking, and the refusal mechanics make two the wrong number — up and down. What you need to price is the finished outcome.

Jakub Kontra
Jakub Kontra
Developer

Ten dollars per million input tokens, fifty per million output — the first numbers I saw for Claude Fable 5, and also the least useful ones. Yet that's exactly where half of Twitter is closing the budget debate with "2x more expensive than Opus, I'll wait." Me, I opened a spreadsheet with last year's Opus invoices instead and started counting. And it turns out that multiplying last year's bill by two is a mistake — in both directions.

Twice the price of Opus. So what?

Facts first. On Tuesday, June 9, 2026, Anthropic released Claude Fable 5: GA on the Claude API, Bedrock, Vertex AI, and Microsoft Foundry, API id claude-fable-5. About the names, so you don't drown in them: "Mythos-class" is a new tier above the Opus class, Fable 5 is its public model, and Claude Mythos 5 is its special edition. Mythos 5 runs on the same underlying model as Fable; approved organizations just get it through Project Glasswing with some of the safeguards removed: cybersecurity partners have the cyber classifiers turned off, selected bio researchers the bio classifiers — they keep the cyber ones. So it's not a "model without classifiers," just one selectively relaxed in a single domain, and it replaces the previous invite-only Mythos Preview.

The tier pricing matches: 10 dollars per million input tokens, 50 per million output, for both models. Opus 4.8 costs 5 and 25. So Fable is exactly 2x more expensive per token. The 1M context window and 128K output, by the way, Opus 4.8 has too, so the specs don't justify the doubling; you're paying for capability. Compared to Mythos Preview, it's less than half the price according to Anthropic, so the direction is down, not up.

Two operational conditions right up front, because in some compliance regimes they decide before price does: there's a mandatory 30-day data retention (zero data retention can't be negotiated) and your data isn't trained on. The model also went through over 1,000 hours of external red-teaming and no universal jailbreak was found.

But "2x more expensive per token" and "2x more expensive to operate" are two different sentences. Between them sit three line items the price list doesn't show.

Item one: tokens aren't your tokens anymore

Fable 5 has a new tokenizer. The same content breaks down into roughly 30% more tokens than on Opus-tier models. The exact delta varies by workload and nobody will promise you yours. Practical consequence: all the token counts you have in dashboards and spreadsheets from Opus don't hold for Fable. Your "million tokens" is no longer the same million.

The good news: you can measure it upfront and for free — token counting isn't billed. The count_tokens endpoint, when queried with model: "claude-fable-5", returns counts under both tokenizers at once: input_tokens for the new one and input_tokens_prior_tokenizer for the old one. Run your real prompts through it and you have your own delta instead of someone else's estimate.

Item two: thinking doesn't turn off

Opus 4.8 thinks too, but there you can turn thinking off. With Fable 5 you can't: adaptive thinking is always on and always billed, and an attempt to send thinking: {type: "disabled"} ends in HTTP 400. You don't get the raw chain-of-thought; via display: "summarized" you at least get a readable summary, but you're billed even for the tokens that stay hidden. All you can tune is output_config.effort, from low to max.

And effort comes with an operational reality that hardly anyone factors into the price: a single request at high effort can easily run for 15 minutes. That means streaming, async UX, and proper timeouts. Otherwise your app simply won't run. Rewriting synchronous "request–response" code is also a budget line item, it just isn't measured in dollars per token.

Item three: refusal and fallback, a.k.a. the "number of attempts" term

The third item isn't a multiplier on the per-token price — it belongs directly in the "number of attempts" term of the equation. What happens when the model refuses? Fable 5 runs with safety classifiers in three domains: cyber (offensive cybersecurity), bio (viral design, for instance), and reasoning_extraction (extracting reasoning for distillation). According to Anthropic, they trigger in less than 5% of sessions; that's an average across all traffic, not a guarantee for your workload. And occasionally they catch a harmless neighboring request too. Security tooling and life sciences take the hit most often.

To be fair: a pre-output refusal isn't billed and arrives as an HTTP 200 with stop_reason: "refusal". Your code should treat it as a state, not an exception. But watch out for one variant: a refusal can also arrive mid-stream, and you've already paid for the output tokens streamed up to that point. It's the only billed refusal, so it deserves its own line in your cost model.

Accounting for refusal means having a fallback designed and priced upfront, not improvised after your first production run falls over. There are three options: the server-side fallbacks parameter (beta, handled in a single round-trip), middleware in the SDK, or a manual retry with a "fallback credit" that refunds you the cache-write cost of the switch. The only supported target is claude-opus-4-8; so your fallback chain is laid out for you.

The equation the price list doesn't show

The cost of a finished outcome isn't the cost per token. It's roughly: (tokens × tokenizer × thinking) × number of attempts + human oversight. The first two items live in the first bracket, refusal and fallback in the number of attempts; the price list shows you only a piece of the first bracket.

That this equation is real isn't news with Fable 5. Reasoning model price lists understate the true cost of operation 5–30x already today, and in agentic workloads input tokens outnumber output tokens 20–25x. The Tokenomics paper quantifies it academically. A single agentic task can easily burn more tokens than a week of chatting. Your cost model was lying already with existing models. Fable just made it visible, because the numbers on the price list are bigger.

But that also implies the opposite possibility: a 2x more expensive model can come out cheaper. If it cuts the number of attempts from three to one and reduces human oversight of the result, the second part of the equation drops more than the first one rises. The benchmarks suggest as much: SOTA nearly everywhere, the longest autonomously working Claude model yet, and Stripe says in Anthropic's announcement that it "compressed months of engineering work into days." Except suggest isn't prove. Nobody has measured it for your workload. Only you can.

What to do about it on Monday

No grand strategy, three steps:

  1. Measure the tokenizer delta. Run real prompts through count_tokens with model: "claude-fable-5" and compare input_tokens with input_tokens_prior_tokenizer. Free, half an hour of work, and you have your own number instead of "about +30%."
  2. Pilot one expensive task — twice. Pick the task where you currently pay the most in retries and human oversight, and run it on Fable once at lower and once at higher effort. Lower effort saves thinking tokens, higher effort raises the chance of first-try success; only measuring both runs will tell you which wins. Don't forget streaming, timeouts, and handling stop_reason: "refusal" with a fallback to Opus 4.8.
  3. Measure cost per completed outcome, not per token. What the finished output cost including attempts and review — on Opus and on Fable. Only then compare.

Once you have those three numbers, you'll find there's nothing left of that two on the price list. Your real multiplier might be four or zero point eight — and only with that number does it make sense to decide whether to migrate.