lawsuit
OpenAI says reciting paywalled articles is fair use. The New York Times disagrees.
Major publishers and music labels are suing AI giants like OpenAI and Anthropic for copyright infringement. Will the 'fair use' defense hold up in court?

Major tech companies constructed their foundational models by scraping billions of words and images from the public internet without compensating the original creators. This data collection strategy powers generative AI — technologies that train on vast quantities of preexisting human-authored works and use inferences from that training to generate new content. Now, a coordinated legal assault from entities like The New York Times and Universal Music Group threatens to dismantle that entire technical and business foundation. The defining battle of the modern technological era will not be fought over algorithmic capabilities, but over the legal boundaries of data ingestion.
AI companies' reliance on the fair use doctrine to justify unauthorized web scraping is legally vulnerable because the models demonstrably regurgitate verbatim copyrighted content, functioning as direct market substitutes for the original works. As media conglomerates, publishers, and music labels pool their resources to file suit against these operations, the legal risk is transitioning from theoretical to existential.
The Multibillion-Dollar Legal Wave
In December 2023, The New York Times escalated the conflict by suing OpenAI and Microsoft for copyright infringement. As The Verge reported, the lawsuit alleges that these models threaten high-quality journalism by free-riding on publishers' massive financial investments. By generating output that recites Times content verbatim, the suit argues, the AI developers are directly depriving them of subscription and licensing revenue.
This action catalyzed a broader legal movement across the media industry. By April 2024, eight major US newspapers, including the Chicago Tribune and New York Daily News, filed a coordinated lawsuit against OpenAI for the unauthorized harvesting of their articles. These publications belong to the Alden Global Capital hedge fund, demonstrating that financial entities backing legacy media are no longer willing to tolerate uncompensated scraping. According to federal court filings, the suit demands a jury trial and seeks compensation for the alleged unauthorized use of millions of articles.
In late 2024, Canadian news outlets launched a similar offensive, arguing that OpenAI systematically breached their terms of service and bypassed technological protections to harvest content. This international expansion of legal claims indicates that the tech industry cannot simply outrun copyright challenges by shifting operations across borders.
The music industry is moving concurrently against competing AI models. Universal Music Group, Concord Music Group, and ABKCO filed a lawsuit against Anthropic over the unauthorized use of copyrighted song lyrics. According to Music Business Worldwide, the lawsuit specifically covers over 20,000 works.
Because statutory copyright damages in the United States can reach up to $150,000 per infringed work, Anthropic faces a potential liability that plausibly exceeds $3 billion. This astronomical figure isn't merely a theoretical maximum; it represents a strategic calculation by the music industry to force a licensing agreement. If courts find that Anthropic engaged in willful infringement, the financial penalty could immediately render the AI startup insolvent, effectively establishing a precedent that unauthorized ingestion carries a terminal business risk.
The Fair Use Defense on Trial
The core of the AI industry's legal defense rests on a highly specific interpretation of fair use — a US legal doctrine that permits limited use of copyrighted material without having to first acquire permission from the copyright holder, evaluated on factors like purpose, nature, amount, and market effect. AI executives confidently assert that training models on publicly available data inherently constitutes fair use under this framework. Tech firms draw analogies to search engines, arguing that just as Google parses web pages to create an index, AI models analyze text to understand statistical relationships between words. They contend that this intermediate copying is transformative because the resulting model serves a completely different purpose than the original articles or songs.
However, plaintiffs argue that AI models are not merely learning from the data in a transformative manner. Instead, they are functioning as direct commercial competitors. The key technical behavior undermining the tech companies' defense is regurgitation — the phenomenon where a generative AI model outputs near-exact copies of its copyrighted training data in response to user prompts.
If an AI system can generate a verbatim copy of a paywalled New York Times article or the exact lyrics to a Universal Music Group song, it fundamentally alters the "market effect" analysis of the fair use test. It ceases to be an abstract research tool and becomes a direct market substitute. When users bypass a subscription paywall because a chatbot can summarize or recite the identical information, the economic harm to the publisher becomes tangible and immediate. When a user can query a chatbot to read a paywalled article or access licensed song lyrics, the original creator loses the associated revenue.
A Bug or a Feature?
The tech giants do not accept the regurgitation narrative without pushback. In their public response to the litigation, OpenAI claims that verbatim regurgitation is a rare bug, arguing that the New York Times artificially 'hacked' prompts to force ChatGPT into reciting articles, which does not reflect normal user behavior.
They framed the legal challenge as a purely financial maneuver. "The New York Times says this lawsuit is about protecting journalism and principles," the OpenAI blog stated. "In reality, it's about their lack of principles in pursuit of pure business interests. We've been consistent in our support for journalism, the long-established principles of fair use, and the Constitution's promise of a more open, competitive future for sharing knowledge."
Plaintiffs, however, argue that this defense is fundamentally a deflection. They maintain that the capacity for regurgitation inherently proves the unauthorized, wholesale memorization of copyrighted works during the training phase. Regardless of specific prompting tricks required to surface the data, the foundational ingestion of the material without a license constitutes the primary infringement. If the model is capable of reciting the article, the model has memorized the article.
A Precedent That Could Break the AI Business
The stakes of these lawsuits extend far beyond traditional corporate fines. The Anthropic music lawsuit took a notably aggressive strategy by naming the CEO and co-founder personally. This suggests plaintiffs are actively exploring the threat of individual executive liability, an escalation designed to force settlements and change operational behavior at the top level of management.
If the fair use defense fails in federal court, the operational mechanics of the AI industry will face an immediate crisis. AI developers may be forced to retroactively license data, a logistical nightmare that contradicts their entire business model. Worse, courts could order them to securely delete existing foundational models trained on infringing materials, functionally erasing billions of dollars in computational investment. As the Universal Music Group lawsuit details, calculating even a fraction of the internet's copyrighted material at the maximum statutory penalty yields figures that dwarf the valuations of the most well-funded AI startups.
The Limits of Moving Fast
The evidence currently logged in federal dockets strongly supports the thesis that AI companies' reliance on the "fair use" doctrine is legally vulnerable. The documented instances of AI models regurgitating verbatim copyrighted content directly undermine the argument that these models are purely transformative tools. By demonstrating that an AI can function as a direct market substitute for original works — whether outputting a news analysis or a pop song lyric — the plaintiffs have identified the fatal weakness in the tech industry's legal armor.
If the courts find that generative AI models act primarily as market substitutes rather than tools of transformative knowledge-sharing, the fair use shield will shatter. AI companies may soon find that the era of moving fast and scraping things has yielded a multibillion-dollar legal debt they are finally being forced to pay.