A class action lawsuit was filed on August 19, 2024, in the United States District Court for the Northern District of California (San Francisco Division) by three authors, Andrea Bartz, Charles Graeber, and Kirk Wallace Johnson, against Anthropic PBC, the developer of the language model Claude. Founded in January 2021 by seven former OpenAI employees, including current CEO Dario Amodei, former VP of Research at OpenAI, Anthropic launched its Claude model in March 2023, followed by Claude 2 in July 2023, and then versions Claude 3 and Claude 3.5 in 2024.
This lawsuit aligns with similar recent actions brought by authors against Meta Platforms, Microsoft, and OpenAI in various federal courts in 2023. Notably, the New York Times filed a suit on December 27, 2023, against OpenAI and Microsoft in the Manhattan Federal Court, alleging the unauthorized use of millions of articles to train ChatGPT. This litigation highlighted AI models' capability to reproduce news articles verbatim, potentially undermining the Times' subscription-based business model.
The plaintiffs, acting on behalf of themselves and a potential class of thousands of authors, allege that Anthropic has engaged in large-scale copyright infringement by reproducing their literary works without authorization to train its artificial intelligence models. According to the complaint, "Anthropic has built a multi-billion dollar business by misappropriating hundreds of thousands of copyrighted books," and “rather than obtain authorization and pay a fair price for the creations it utilizes, Anthropic has used them without permission.” The plaintiffs claim that Anthropic knowingly downloaded and copied pirated versions of their books, including via a database known as "The Pile," which includes "Books3," a collection of nearly 200,000 books sourced from the site Bibliotik. The complaint emphasizes that the creators of EleutherAI, who developed "The Pile," explicitly acknowledged that "Books3 consists almost entirely of copyrighted works."
This unauthorized exploitation is alleged to have enabled Anthropic to develop its Claude model and generate substantial revenue, with projections exceeding $850 million for 2024. The plaintiffs base their claim on the infringement of their exclusive rights of reproduction and distribution as guaranteed by section 106 of Title 17 of the U.S. Code and specifically invoke section 501 of the same title regarding copyright violations. They contend that Anthropic could have, like other AI companies such as OpenAI, Google, and Meta, obtained legal licenses at the cost of "hundreds of millions of dollars to reproduce copyrighted material for LLM training,” referencing agreements made with Axel Springer, News Corporation, and the Associated Press.
The outcome of these multiple lawsuits remains uncertain, with litigation between media and creative industries and generative AI developers likely to be resolved through licensing agreements, similar to those concluded by OpenAI with Axel Springer or the Associated Press. Such agreements, potentially lucrative for rights holders, may herald a new economic model for the use of protected content by AI systems.
It is worth noting that no similar action has yet been filed in French courts, although the issue of unauthorized use of protected works for AI model training also arises under French law, particularly in view of provisions in the Intellectual Property Code relating to copyright and the exception for text and data mining.