Anthropic
Anthropic’s Claude Code burned through 5-hour quotas in 19 minutes. It was a silent cache downgrade.
Anthropic's Claude Code crisis saw $200 'Max' plans expire in 19 minutes due to a silent cache downgrade and token bug. Here is the technical post-mortem.

The launch of Claude Code was supposed to be Anthropic’s definitive answer to the friction of modern software engineering. Billed as a flagship agentic CLI, it promised to turn high-level intent into multi-file PRs while developers sipped coffee, or perhaps just stared at the terminal in awe. For many 'Max' subscribers paying upwards of $200 a month, however, the coffee didn't even have time to cool before their terminal screens flashed a quota exhaustion warning. Five-hour allotments—the theoretical limit of a standard professional session—were vanishing in under 20 minutes, leaving power users with a stalled agent and a very expensive bill for what amounted to little more than a directory scan.
The Claude Code quota crisis was a systemic failure caused by the intersection of a silent 91% reduction in prompt cache TTL and a critical token accounting bug, proving that the economic viability of agentic AI is currently precarious and vulnerable to undocumented infrastructure shifts. While the marketing suggested a seamless transition to Agentic Coding—an AI framework where tools operate autonomously within a local filesystem to execute commands and edits without continuous human prompting—the underlying reality was a fragile ecosystem of server-side shortcuts that failed the moment the pressure was applied.
What happened: The 19-Minute Drain
In late March 2026, the developer community began logging receipts that contradicted Anthropic’s promised usage limits. Users who had barely initialized their projects reported that their daily quotas were hitting zero before they could even finish a single feature implementation. According to reports from Forbes, these glitching limits were breaking developer workflows globally, turning a productivity tool into a source of billing anxiety.
The technical "perfect storm" that caused this drain consisted of two primary failures. First, there was the "Stealth Nerf." On March 6, 2026, Anthropic silently reduced the Claude TTL (Time To Live)—the duration cached tokens remain valid on the server before being purged—from 60 minutes down to just 5 minutes. This change, documented by ByteIota, meant that any Prompt Caching benefits were lost if a developer took more than a five-minute break between commands.
Second, a devastating accounting error emerged in the system’s logic. GitHub issue #45756 detailed how cache_read tokens, which are supposed to be billed at a significant discount, were instead being counted at 100% of the usage rate limit. Essentially, the very mechanism designed to make agentic AI affordable was being treated as a full-price expense by the quota tracker. For an agent constantly reading and re-reading a 50,000-line repository, this bug acted as a token bonfire.
The reliability crisis was further compounded on April 1, 2026, when Anthropic accidentally leaked part of the internal source code for Claude Code due to human error, as confirmed by BBC News. The leak suggests that even the tool's creators were struggling with the complexity of their own infrastructure during this period of high-frequency failures.
The Counter-Argument: Early Access Growing Pains
There is, of course, a more charitable view. Defenders of Anthropic argue that the productivity gains of agentic coding still justify the premium cost, and that early-access bugs are an expected part of the innovation cycle. Anthropic’s Jarred Sumner defended the TTL shift on GitHub issue #46829, claiming the change actually makes Claude Code cheaper for "one-shot" requests where context isn't revisited. By lowering the write cost for the 5-minute tier to 1.25x (compared to the 2x cost of the 1-hour tier), Anthropic argues it is optimizing for the most common usage patterns.
However, professional editorial standards dictate that 'early access' does not excuse silent regressions in cost-mitigation features that were marketed as part of a $200/month value proposition. When a company changes the fundamental math of a subscription—reducing the value of a cache by 91% without notice—it isn't just a bug; it is an undocumented shift in the product's economic contract. This perspective, found in analyses like AI Product Weekly, suggests that developers cannot budget for tools that change their "burn rate" without warning.
Why it matters: The Fragile Economics of Autonomy
Agentic workflows are uniquely sensitive to caching because of the "Agentic Tax." Unlike a simple chat interface where a user asks one question and gets one answer, an agentic CLI must maintain a constant "live" view of the filesystem. It scans directories, reads configuration files, and analyzes imports repeatedly. Data analyzed by The Register suggests that February usage patterns with the 1-hour TTL generated only 1.1% overhead, whereas the March shift to a 5-minute TTL spiked overhead to 25.9%.
| Feature | Pre-March 6th Policy | Post-March 6th Policy |
|---|---|---|
| Cache TTL | 60 Minutes | 5 Minutes |
| Quota Calculation | Discounted Cache Rates | 100% Full-Price (Bugged) |
| Max Plan Value | ~5 Hours of Flow | ~19 Minutes of Flow |
Without persistent caching, the costs of these operations scale exponentially rather than linearly. Every time the agent checks a file to ensure a change didn't break a dependency, it is effectively re-reading the entire project context. If the TTL is only five minutes, a developer who pauses to read a stack overflow post returns to a "cold" cache. One developer documented over $2,530 in overpayments across 119,000 API calls due to this specific inefficiency.
This "Opaque Infrastructure Problem" is the true dealbreaker for enterprise adoption. As noted in reports on TechBuzz, the AI industry is currently mirroring the aggressive, undocumented pivots seen in the social media API space. Anthropic eventually acknowledged that users were hitting limits "way faster than expected," but the damage to the developer-tool relationship was already done. The 19-minute drain represents a 93% reduction in promised value for Max subscribers, a margin of error that would be unacceptable in any other professional service industry.
What's next: Rebuilding the Trust Cache
Anthropic has since moved to address the "math" of the token bug, ensuring that cache_read tokens are no longer draining quotas at the full-price rate. However, the underlying infrastructure shift remains: the TTL remains at the reduced 5-minute limit. This suggests that the cost of maintaining massive, persistent prompt caches for thousands of developers simultaneously is a burden Anthropic is currently unwilling to bear.
The incident highlights a desperate need for transparent Service Level Agreements (SLAs) in the AI space. If "Usage Limits" can be effectively slashed by 90% through a silent configuration change on a Friday afternoon, then "Professional" plans are professional in name only. Developers are currently being used as the stress-testers for Anthropic's infrastructure growing pains, paying a premium for the privilege of watching their quotas evaporate in real-time.
While the "top priority" fix has stabilized the counting bug, the permanent reduction in cache TTL remains a "stealth nerf" that fundamentally alters the efficiency of the tool.
Conclusion: The Stealth Nerf as Permanent Policy
Returning to the initial thesis, the evidence from the March 2026 crisis confirms that the economic viability of agentic AI is currently precarious. The combination of a 91% TTL reduction documented by ByteIota and the token counting errors reported on GitHub demonstrates a system where the "math" of AI pricing is still being written in pencil.
Anthropic may have fixed the accounting error that caused the 19-minute drain, but by keeping the cache TTL at five minutes, they have signaled that the high resource intensity of agentic coding is a cost they intend to pass directly to the user. The "19-minute drain" was a symptom of a much larger problem: professional AI is currently too expensive to be as reliable as the tools it intends to replace. For now, the "trust cache" remains empty, and power users may find themselves looking toward self-hosted or open-source alternatives where the TTL isn't subject to a silent, mid-month downgrade.