Téléchargez gratuitement notre eBook "Pour une stratégie d'entreprise éco-responsable"
télécharger
French
French
Les opérations de Carve-Out en France
DÉcouvrir
Découvrez le Livre Blanc : "Intelligence artificielle : quels enjeux juridiques"
DÉcouvrir
Intelligence Artificielle : quels enjeux juridiques ?
Actualité
26/6/25

Pirated Libraries and AI Training: A Federal Court’s Landmark Ruling on Fair Use in Bartz et al. v. Anthropic PBC

This legal summary concerns the order issued on June 23, 2025, by the United States District Court for the Northern District of California in the case of Andrea Bartz, Charles Graeber, Kirk Wallace Johnson, Bartz Inc., and MJ + KJ Inc. v. Anthropic PBC (No. C 24-05417 WHA). The plaintiffs, all authors or copyright holders, brought a copyright infringement action against Anthropic PBC, a company specializing in the development of generative artificial intelligence systems.

The decision, rendered by Judge William Alsup, addressed a motion for summary judgment filed by Anthropic prior to class certification. This motion focused exclusively on the applicability of the fair use doctrine under Section 107 of the Copyright Act to various acts of copying performed by Anthropic in the development and training of its large language models (LLMs), marketed under the name “Claude.”

Anthropic argued that its use of copyrighted books—some obtained lawfully and others downloaded from pirate sources—fell within the ambit of fair use because it was transformative and justified by the requirements of AI model training. The plaintiffs contested this position, particularly with regard to the use of pirated copies and the reproduction of their works without authorization.

The Court’s order constitutes the first substantive ruling in this litigation and provides a detailed analysis of the different categories of uses at issue—ranging from LLM training, to the digitization of purchased books, to the retention of pirated material in a permanent internal library—and evaluates each in light of the four fair use factors set forth in the Copyright Act.

1. Factual background

Anthropic PBC, an artificial intelligence firm founded in 2021 by former OpenAI employees, released several versions of its LLM “Claude” beginning in March 2023. To train its models, Anthropic assembled a centralized research library by acquiring millions of books through two main channels: first, by downloading over seven million pirated copies of books from illegal online sources such as Books3, LibGen, and PiLiMi; and second, by purchasing millions of print books (including used copies), which it then digitized after destructively scanning them.

The company used these digitized books, including works authored by the plaintiffs, to train its models. Internal communications showed that the company deliberately prioritized pirated sources to avoid the “legal/practice/business slog” of licensing, as co-founder and CEO Dario Amodei phrased it. The objective was to build and retain a permanent internal dataset of “all the books in the world” for use in training and possibly other future applications.

2. Claims and legal framework

The plaintiffs, three authors and their associated rights-holding entities, alleged unauthorized reproduction of their copyrighted works in violation of 17 U.S.C. § 106. The works had been reproduced multiple times in various forms during data ingestion, cleaning, tokenization, and training.

Anthropic did not dispute that these acts occurred, but instead sought summary judgment on the grounds that all relevant uses were protected by the fair use doctrine under Section 107 of the Copyright Act. The company asserted that its use was highly transformative, as the books were not exploited commercially as such, but used to develop a general-purpose AI that generates new and original text.

3. Legal analysis – Fair Use under 17 U.S.C. § 107

a) Use of copyrighted works for LLM training

Judge Alsup concluded that using copyrighted books to train LLMs like Claude constituted a “spectacularly transformative” use under the first statutory factor. He emphasized that:

“Like any reader aspiring to be a writer, Anthropic’s LLMs trained upon works not to race ahead and replicate or supplant them — but to turn a hard corner and create something different.”

The Court found that plaintiffs had not alleged any infringing outputs generated by the Claude models. Moreover, Anthropic had implemented filters to ensure that direct excerpts from the plaintiffs’ works would not reach users. As such, the training use was found to serve a distinct and non-substitutive purpose, favoring a finding of fair use.

b) Digitization of lawfully acquired print books

The Court held that Anthropic’s conversion of purchased physical books into digital versions for internal use also constituted fair use. Although the physical copies were destroyed in the process, each was replaced by a single corresponding digital copy used exclusively in Anthropic’s internal “research library.”

The Court compared this use to format-shifting and space-saving practices previously upheld in cases like Sony Betamax and Google Books. The digitization enabled searchability and storage efficiency without creating new expressive content or distributing the copies externally.

Judge Alsup stated:

“This print-to-digital conversion involved a different and narrower form of transformative use […] but [was] transformative for that reason alone.”

c) Retention and use of pirated copies

By contrast, the Court categorically rejected Anthropic’s reliance on fair use for its pirated acquisitions. While some of these copies were used in LLM training, many were not, and Anthropic retained all pirated books indefinitely, even when they were deemed unnecessary for training.

The Court stated unequivocally:

“Creating a permanent, general-purpose library was not itself a fair use excusing Anthropic’s piracy.”

The Court emphasized that downloading unauthorized copies from sources like LibGen and Books3—when lawful avenues existed—was inherently infringing. This remained true even if the pirated books were later used for arguably transformative purposes. In doing so, Judge Alsup distinguished the case from Perfect 10, Texaco, and Google Books, emphasizing that the initial act of piracy could not be excused by downstream utility.

4. Scope and implications of the ruling

This decision establishes a principled distinction between types of use that may qualify as fair use in the context of AI training:

  • Training AI models using lawfully acquired works—whether purchased in print or in digital form—may constitute fair use, provided that the use is transformative and does not substitute for the original works;
  • Digitizing purchased print books for internal storage and indexing is likewise protected by the fair use doctrine;
  • Downloading pirated books to build an internal dataset, even if not redistributed or monetized directly, is not fair use. The act of piracy itself remains unexcused.

The ruling does not resolve the case entirely—it does not determine liability or damages—but it significantly narrows the scope of lawful uses available to AI developers and signals that copyright infringement cannot be justified merely by innovative end goals.

Conclusion

The order issued in Bartz et al. v. Anthropic PBC is a landmark decision in the legal treatment of copyright and artificial intelligence. It affirms that transformative use is central to the fair use doctrine, even in the age of machine learning. However, it also draws a clear red line: no matter how innovative the application, fair use cannot retroactively sanitize the deliberate and large-scale appropriation of pirated material.

The message to AI developers is clear: fair use begins where lawful acquisition begins.

Vincent FAUCHOUX
Découvrez l'eBook : Les opérations de Carve-Out en France
Télécharger
Découvrez le Livre Blanc : "Intelligence artificielle : quels enjeux juridiques"
Télécharger
Intelligence Artificielle : quels enjeux juridiques ?

Abonnez vous à notre Newsletter

Recevez chaque mois la lettre du DDG Lab sur l’actualité juridique du moment : retrouvez nos dernières brèves, vidéos, webinars et dossiers spéciaux.
je m'abonne
DDG utilise des cookies dans le but de vous proposer des services fonctionnels, dans le respect de notre politique de confidentialité et notre gestion des cookies (en savoir plus). Si vous acceptez les cookies, cliquer ici.