Anthropic Admitted a Month of Claude Code Degradation — Here's What Actually Happened

Anthropic Admitted a Month of Claude Code Degradation — Here's What Actually Happened

From March 4 to April 20, 2026, three concurrent degradation modes ran in Claude Code. Anthropic frames them as bugs — but two of the three were deliberate decisions.

Jakub Kontra
Jakub Kontra
Developer

Since early March, something felt off. Claude Code stopped holding context across longer sessions and kept repeating steps it had already done. On top of that, it picked tools inconsistently and burned through usage limits faster. You deleted CLAUDE.md, rewrote prompts, tried a fallback to a different model. GitHub issues kept piling up, but Anthropic stayed silent. It looked like only your setup was broken. It wasn't — from March 4, 2026 to April 20, 2026, three overlapping degradation modes were running in production, and only one of them was a real technical bug.

Timeline of three overlaps

Three products were affected: Claude Code, Claude Agent SDK, and Claude Cowork. The Anthropic API itself was untouched. On April 2, 2026, Stella Laurenzo from AMD published a public telemetry dataset spanning 6,852 sessions, which showed Claude Code's reasoning depth had dropped by 67%. At that moment, the first of the three causes had already been in production for a month, and the second one for a week.

Three dates describe three concurrent causes:

  • Mar 4, 2026 — Anthropic ships Opus 4.6 and with it drops the default reasoning effort from high to medium.
  • Mar 26, 2026 — a caching optimization goes into production that will later turn out to be a bug.
  • Apr 16, 2026 — Opus 4.7 ships with a new verbosity cap in the system prompt.

And the rest of what happened in between:

  • Apr 2, 2026 — Laurenzo publishes telemetry with a 67% drop in reasoning depth.
  • Apr 7, 2026 — the reasoning effort default is reverted; the new values are xhigh for Opus 4.7 and high for the other models.
  • Apr 10, 2026 — the caching bug is fixed in version 2.1.101, i.e. 15 days after deployment.
  • Apr 13–14, 2026 — the problem escalates in the media.
  • Apr 20, 2026 — the verbosity cap is quietly gone in version 2.1.116.

Two more events belong not to the mechanics of the degradation but to its consequences: on April 21, 2026, Claude Code was quietly pulled from the Pro plan for some new signups, and on April 23, 2026, Anthropic published the postmortem and reset usage limits for subscribers.

Three causes, not one

Caching bug. It showed up exactly the way you saw it: the model lost its own reasoning context, repeated steps it had already done, picked tools inconsistently, burned through limits faster, and seemed to forget what it had decided ten minutes earlier. Technically, it was an optimization deployed on March 26, 2026 (API header clear_thinking_20251015 with parameter keep:1) that was supposed to clear thinking blocks only from inactive sessions — specifically those idle for more than an hour. In reality, it cleared the thinking context "every subsequent turn" for the rest of the session. The fix arrived on April 10, 2026 in version 2.1.101, i.e. 15 days after deployment.

Reasoning effort default. When Opus 4.6 launched on March 4, 2026, Anthropic lowered the default reasoning effort from high to medium. The rationale was "slightly lower intelligence for significantly lower latency on most tasks". Internally, they later described it themselves as a "bad tradeoff". After the revert on April 7, 2026, the new defaults are xhigh for Opus 4.7 and high for the rest.

Verbosity cap. The Opus 4.7 system prompt gained text that capped commentary between tool commands at 25 words and the final answer at 100 words. Opus 4.7 felt too talkative internally, but according to Anthropic's postmortem from April 23, 2026, broader tests showed a 3% drop in code generation quality. The cap hit production on April 16, 2026 and disappeared four days later in version 2.1.116.

Bug versus decision

At this point, the postmortem stops holding up. Only the caching issue was a real bug — the code didn't behave according to spec, and after the fix it returned to its intended logic. The other two incidents are something else.

Lowering the reasoning effort default from high to medium was an intentional tradeoff, one Anthropic even publicly justified with the quotes above. The verbosity cap in the system prompt was a deliberate change shipped with full awareness that it shortened output; the ablation test that would have caught the 3% drop in code generation simply didn't run before rollout.

Two of the three "incidents" were deliberate decisions to reduce compute — meaning, put simply, how many GPU-seconds the model actually gets per request and how many response tokens it produces back. Only one was a bug. The postmortem from April 23, 2026 frames all three under the single heading "degraded quality", and that's not a stylistic detail. It blurs the distinction between an unintended bug and a deliberate reduction in intelligence.

The motivation isn't exactly a mystery. Fortune reported on April 14, 2026 that OpenAI internally describes Anthropic as operating "on substantially smaller curve" in compute capacity, relative to growing demand. Anthropic didn't comment. That gives context for why deliberate cuts to latency and output would come right now. The quiet removal of Claude Code from the Pro plan on April 21, 2026 points the same way.

Why no one inside saw it

Anthropic's internal investigation took, according to its own postmortem, "more than a week" after active examination began. Dogfooding teams weren't using the production build. They were running on different infrastructure — server-side message queuing, modified thinking display. The caching bug, which only manifests under production routing and normal session length, couldn't be reproduced inside the company. Anthropic itself argues that the bugs were hard to reproduce outside production. That isn't an excuse; it's a diagnosis of a structural blind spot: a team that doesn't test its own product on the same path as the customer doesn't see what the customer sees. Users had been reporting this since early March; the official admission came roughly three weeks later.

What Anthropic promised and what it didn't

In the April 23, 2026 postmortem, Anthropic promises the following: dogfooding teams will switch to public builds instead of isolated test versions; they'll deploy enhanced code review tools internally and externally; all system-prompt changes will go through ablation testing per model; model-specific changes will stay isolated; tradeoffs of the "intelligence vs. other metrics" kind will get gradual rollout; communication will run through the official @ClaudeDevs on X and GitHub.

One item is missing from the list. There's no commitment to announce deliberate product changes affecting quality in advance. Which is exactly what would address two of the three incidents that just happened. An ablation test will catch a 3% drop, but it won't tell you up front that Anthropic lowered the default reasoning effort because of latency and the compute budget. Anthropic pulled the same maneuver back in September 2025, when it similarly filed a routing bug and a tightening of the token budget for long contexts under a single label.

What to take from this

If you're paying for Max or Pro, one practical thing helps more than anything else: stop relying on the postmortem as the single source of truth and start keeping your own telemetry. Laurenzo showed that 6,852 samples are enough for an unambiguous signal; for you, it'll be enough to log the number of turns, tool calls, how many times the model returned to something it had already solved, and which build it belongs to. A pinned version of Claude Code that you've tested on real work also gives you time to decide whether you want the upgrade — the differences between 2.1.101 and 2.1.116 aren't cosmetic, and without versioned data you can't tell whether the model got worse or your project did.

The second point isn't about tooling, but about how to read a postmortem. The April 23, 2026 text is factually useful, but it's also a rhetorical operation — it blurs the line between "we broke it by accident" and "we switched off part of your intelligence because the compute math doesn't work out for us, i.e. the ratio between available GPU capacity and what we promised users". If you're paying for Max, that exact distinction is the one you'll want to track yourself next time, because nobody else is going to do it for you.