Anthropic

Anthropic’s Claude Code burned through 5-hour quotas in 19 minutes. It was a silent cache downgrade.

Anthropic's Claude Code crisis saw $200 'Max' plans expire in 19 minutes due to a silent cache downgrade and token bug. Here is the technical post-mortem.

By ai-fails.lol2026-04-13

Anthropic’s Claude Code burned through 5-hour quotas in 19 minutes. It was a silent cache downgrade.

The launch of Claude Code was supposed to be Anthropic’s definitive answer to the friction of modern software engineering. Billed as a flagship agentic CLI, it promised to turn high-level intent into multi-file PRs while developers sipped coffee, or perhaps just stared at the terminal in awe. For many 'Max' subscribers paying upwards of $200 a month, however, the coffee didn't even have time to cool before their terminal screens flashed a quota exhaustion warning. Five-hour allotments—the theoretical limit of a standard professional session—were vanishing in under 20 minutes, leaving power users with a stalled agent and a very expensive bill for what amounted to little more than a directory scan.

The Claude Code quota crisis was a systemic failure caused by the intersection of a silent 91% reduction in prompt cache TTL and a critical token accounting bug, proving that the economic viability of agentic AI is currently precarious and vulnerable to undocumented infrastructure shifts. While the marketing suggested a seamless transition to Agentic Coding—an AI framework where tools operate autonomously within a local filesystem to execute commands and edits without continuous human prompting—the underlying reality was a fragile ecosystem of server-side shortcuts that failed the moment the pressure was applied.

What happened: The 19-Minute Drain

A digital stopwatch displaying 19 minutes with a cracked screen glowing orange — Sasun Bughdaryan / Unsplash

In late March 2026, the developer community began logging receipts that contradicted Anthropic’s promised usage limits. Users who had barely initialized their projects reported that their daily quotas were hitting zero before they could even finish a single feature implementation. According to reports from Forbes, these glitching limits were breaking developer workflows globally, turning a productivity tool into a source of billing anxiety.

The technical "perfect storm" that caused this drain consisted of two primary failures. First, there was the "Stealth Nerf." On March 6, 2026, Anthropic silently reduced the Claude TTL (Time To Live)—the duration cached tokens remain valid on the server before being purged—from 60 minutes down to just 5 minutes. This change, documented by ByteIota, meant that any Prompt Caching benefits were lost if a developer took more than a five-minute break between commands.

Second, a devastating accounting error emerged in the system’s logic. GitHub issue #45756 detailed how cache_read tokens, which are supposed to be billed at a significant discount, were instead being counted at 100% of the usage rate limit. Essentially, the very mechanism designed to make agentic AI affordable was being treated as a full-price expense by the quota tracker. For an agent constantly reading and re-reading a 50,000-line repository, this bug acted as a token bonfire.

The reliability crisis was further compounded on April 1, 2026, when Anthropic accidentally leaked part of the internal source code for Claude Code due to human error, as confirmed by BBC News. The leak suggests that even the tool's creators were struggling with the complexity of their own infrastructure during this period of high-frequency failures.

The Counter-Argument: Early Access Growing Pains

There is, of course, a more charitable view. Defenders of Anthropic argue that the productivity gains of agentic coding still justify the premium cost, and that early-access bugs are an expected part of the innovation cycle. Anthropic’s Jarred Sumner defended the TTL shift on GitHub issue #46829, claiming the change actually makes Claude Code cheaper for "one-shot" requests where context isn't revisited. By lowering the write cost for the 5-minute tier to 1.25x (compared to the 2x cost of the 1-hour tier), Anthropic argues it is optimizing for the most common usage patterns.

However, professional editorial standards dictate that 'early access' does not excuse silent regressions in cost-mitigation features that were marketed as part of a $200/month value proposition. When a company changes the fundamental math of a subscription—reducing the value of a cache by 91% without notice—it isn't just a bug; it is an undocumented shift in the product's economic contract. This perspective, found in analyses like AI Product Weekly, suggests that developers cannot budget for tools that change their "burn rate" without warning.

Why it matters: The Fragile Economics of Autonomy

A tall stack of hundred-dollar bills slowly charring at the edges — Jp Valery / Unsplash

Agentic workflows are uniquely sensitive to caching because of the "Agentic Tax." Unlike a simple chat interface where a user asks one question and gets one answer, an agentic CLI must maintain a constant "live" view of the filesystem. It scans directories, reads configuration files, and analyzes imports repeatedly. Data analyzed by The Register suggests that February usage patterns with the 1-hour TTL generated only 1.1% overhead, whereas the March shift to a 5-minute TTL spiked overhead to 25.9%.

Feature	Pre-March 6th Policy	Post-March 6th Policy
Cache TTL	60 Minutes	5 Minutes
Quota Calculation	Discounted Cache Rates	100% Full-Price (Bugged)
Max Plan Value	~5 Hours of Flow	~19 Minutes of Flow

Without persistent caching, the costs of these operations scale exponentially rather than linearly. Every time the agent checks a file to ensure a change didn't break a dependency, it is effectively re-reading the entire project context. If the TTL is only five minutes, a developer who pauses to read a stack overflow post returns to a "cold" cache. One developer documented over $2,530 in overpayments across 119,000 API calls due to this specific inefficiency.

This "Opaque Infrastructure Problem" is the true dealbreaker for enterprise adoption. As noted in reports on TechBuzz, the AI industry is currently mirroring the aggressive, undocumented pivots seen in the social media API space. Anthropic eventually acknowledged that users were hitting limits "way faster than expected," but the damage to the developer-tool relationship was already done. The 19-minute drain represents a 93% reduction in promised value for Max subscribers, a margin of error that would be unacceptable in any other professional service industry.

What's next: Rebuilding the Trust Cache

Anthropic has since moved to address the "math" of the token bug, ensuring that cache_read tokens are no longer draining quotas at the full-price rate. However, the underlying infrastructure shift remains: the TTL remains at the reduced 5-minute limit. This suggests that the cost of maintaining massive, persistent prompt caches for thousands of developers simultaneously is a burden Anthropic is currently unwilling to bear.

The incident highlights a desperate need for transparent Service Level Agreements (SLAs) in the AI space. If "Usage Limits" can be effectively slashed by 90% through a silent configuration change on a Friday afternoon, then "Professional" plans are professional in name only. Developers are currently being used as the stress-testers for Anthropic's infrastructure growing pains, paying a premium for the privilege of watching their quotas evaporate in real-time.

While the "top priority" fix has stabilized the counting bug, the permanent reduction in cache TTL remains a "stealth nerf" that fundamentally alters the efficiency of the tool.

Conclusion: The Stealth Nerf as Permanent Policy

Returning to the initial thesis, the evidence from the March 2026 crisis confirms that the economic viability of agentic AI is currently precarious. The combination of a 91% TTL reduction documented by ByteIota and the token counting errors reported on GitHub demonstrates a system where the "math" of AI pricing is still being written in pencil.

Anthropic may have fixed the accounting error that caused the 19-minute drain, but by keeping the cache TTL at five minutes, they have signaled that the high resource intensity of agentic coding is a cost they intend to pass directly to the user. The "19-minute drain" was a symptom of a much larger problem: professional AI is currently too expensive to be as reliable as the tools it intends to replace. For now, the "trust cache" remains empty, and power users may find themselves looking toward self-hosted or open-source alternatives where the TTL isn't subject to a silent, mid-month downgrade.

Tags:Anthropic Claude Code pricing

Anthropic

Anthropic’s Claude Code burned through 5-hour quotas in 19 minutes. It was a silent cache downgrade.

Anthropic's Claude Code crisis saw $200 'Max' plans expire in 19 minutes due to a silent cache downgrade and token bug. Here is the technical post-mortem.

By ai-fails.lol2026-04-13

What happened: The 19-Minute Drain

The Counter-Argument: Early Access Growing Pains

Why it matters: The Fragile Economics of Autonomy

Feature	Pre-March 6th Policy	Post-March 6th Policy
Cache TTL	60 Minutes	5 Minutes
Quota Calculation	Discounted Cache Rates	100% Full-Price (Bugged)
Max Plan Value	~5 Hours of Flow	~19 Minutes of Flow

What's next: Rebuilding the Trust Cache

While the "top priority" fix has stabilized the counting bug, the permanent reduction in cache TTL remains a "stealth nerf" that fundamentally alters the efficiency of the tool.

Conclusion: The Stealth Nerf as Permanent Policy

Tags:Anthropic Claude Code pricing