Téléchargez gratuitement notre eBook "Pour une stratégie d'entreprise éco-responsable"
télécharger
French
French
Les opérations de Carve-Out en France
DÉcouvrir
Découvrez le Livre Blanc : "Intelligence artificielle : quels enjeux juridiques"
DÉcouvrir
Intelligence Artificielle : quels enjeux juridiques ?
Actualité
5/6/25

Generative AI Training and Copyright Law in the United States: In-Depth Review of the U.S. Copyright Office’s May 2025 Report and Its Political Reverberations

1. An Authoritative Framework in a Time of Legal Uncertainty

On May 9, 2025, the U.S. Copyright Office published a pre-publication draft of the third and final part of its seminal series Copyright and Artificial Intelligence, titled Generative AI Training. This 234-page document represents the most comprehensive legal and policy articulation to date from a U.S. federal authority on the implications of using copyrighted works to train foundation models—systems capable of generating expressive outputs (text, music, images, and more) based on massive datasets often compiled without licenses.

The report was issued by the Register of Copyrights, Shira Perlmutter, and is the result of a lengthy public inquiry initiated in 2023. It integrates findings from over 10,000 submissions, including major AI developers, creators' collectives, publishers, scholars, and members of the public.

Although formally labeled “pre-publication,” the Register has stated that no material changes are expected before final adoption. The report is organized into seven chapters and multiple appendices, reflecting both legal analysis under existing copyright doctrines and legislative policy options for future reform.

2. The Central Legal Question: Does AI Training Infringe Copyright?

At the heart of the report lies the recognition that the copying of copyrighted works during AI training is not speculative—it is a necessary and undisputed feature of how foundation models are developed. Developers copy large volumes of data, often in full, to create statistical representations that inform the model’s ability to generate new content.

The Office acknowledges that such copying likely constitutes prima facie infringement under U.S. copyright law—specifically under 17 U.S.C. §§ 106(1) and (2) concerning the rights of reproduction and preparation of derivative works—unless a valid exception such as fair use applies.

The report further clarifies that infringement does not necessarily require the model to produce outputs identical or substantially similar to the copyrighted inputs; the mere act of copying without authorization, even if not publicly disseminated, may suffice for liability. This assertion aligns with case law such as American Geophysical Union v. Texaco and the foundational MAI Systems decision.

3. Fair Use: A Legally Unstable Safe Harbor for Foundation Models

The most detailed portion of the report is devoted to the application of the four statutory fair use factors (17 U.S.C. § 107) to AI training, which the Office approaches with analytical rigor:

  • Purpose and Character of the Use: While some AI proponents argue that training is transformative because it creates a new tool rather than reproducing specific works, the Office is skeptical. It underscores that transformative use requires more than technological repurposing—it must alter the expression, meaning, or message of the original. Citing Andy Warhol Foundation v. Goldsmith, the Office warns against conflating automation with creativity, especially when outputs are used commercially.
  • Nature of the Work: Training data typically include a high proportion of highly creative, expressive works, such as literary fiction, songs, visual art, or journalism—types of works that are entitled to the greatest degree of copyright protection. The inclusion of unpublished works (e.g., private blogs, drafts, or emails), which receive even stronger protection under § 104(b), further complicates claims of fair use.
  • Amount and Substantiality: Most models ingest entire works, not fragments, often in billions of tokens. Courts have generally found that the reproduction of whole works weighs against fair use unless necessary and justified. The Office notes that some models even duplicate individual works thousands of times for technical optimization, raising serious legal flags.
  • Market Effect: This is where the Office expresses greatest concern. AI-generated content is already substituting for creative human work in multiple sectors (e.g., illustration, photography, marketing copy). The report emphasizes that diminished licensing revenue or the destruction of emergent licensing markets is not speculative—it is demonstrable. Thus, training without permission or compensation may constitute market harm, weakening the fair use defense.

The overall conclusion is that, while some narrowly defined non-commercial or research uses may qualify as fair, commercial foundation model training is unlikely to fall within the safe harbor of § 107 absent a robust transformative purpose or public benefit.

4. Outputs, Memorization, and Derivative Works: Legal Uncertainties

A separate section of the report addresses whether the outputs of generative models could themselves infringe copyright. The Office draws a distinction between outputs that are:

  • Statistically generated and novel, and
  • “Regurgitated” content that reproduces source material too closely, especially in verbatim form.

The report refers to documented cases where models reproduced entire passages from copyrighted books or entire photographs when prompted. If such outputs are substantially similar to protected inputs, they could constitute derivative works (under § 106(2)) or infringing reproductions. Developers are thus urged to implement anti-memorization safeguards and to be transparent with users about risks of output-based liability.

5. Legislative and Licensing Proposals: A Way Forward

To address the unsustainable reliance on fair use, the Copyright Office outlines three categories of possible licensing reforms:

  • Voluntary Licensing: Encouraging the development of transparent, machine-readable licensing schemes. This includes proposals for AI-specific metadata standards and web-wide opt-out protocols, akin to the Copyright Office’s pilot “Standard Technical Measures” project.
  • Extended Collective Licensing (ECL): Inspired by European and Canadian frameworks, ECL would authorize collecting societies to license large-scale uses on behalf of rights holders, including those not individually contracted. The Office views this as a realistic compromise, especially for visual and audio works.
  • Statutory Compulsory Licensing: While not formally recommended, the report acknowledges pressure for Congress to consider compulsory licenses akin to those under § 115 (musical compositions). The Office warns, however, that such models must be narrowly tailored, technologically neutral, and constitutionally sound.

The Office also calls for mandatory disclosure requirements on dataset composition, training protocols, and memorization risks. Such regulation could form part of future rulemaking or federal AI legislation.

6. Political Fallout: The Dismissal of Shira Perlmutter

One day after the report’s release, on May 10, 2025, President Donald Trump abruptly dismissed Shira Perlmutter by email. Multiple news outlets—including Reuters and NPR—reported that the dismissal followed internal disagreements about the Office’s regulatory posture toward generative AI. Perlmutter subsequently filed suit, alleging that only the Librarian of Congress has the legal authority to remove the Register—a claim grounded in the Copyright Office Modernization Act of 2017.

A federal district court declined to issue an injunction reinstating her, but litigation is ongoing. Legal scholars, including Pamela Samuelson and James Grimmelmann, have warned that such politicized removals threaten the independence of copyright administration, especially at a time of global regulatory divergence on AI and intellectual property.

The firing has sparked debate within the U.S. legal community as to whether the Copyright Office should be reconstituted as an independent agency, immune from presidential interference.

7. Conclusion: Between Law, Code, and Power

The 2025 Copyright Office report is not just a legal analysis—it is a strategic roadmap for the future of generative AI in the United States. It insists, against mounting pressure, that copyright law remains fully applicable to large-scale automated systems, and that creators are entitled to know, consent, and be compensated when their works are used to train machines.

While the political backlash underscores the volatility of this terrain, the report remains a foundational legal reference. For AI developers, legal counsel, policymakers, and content creators, it sets out a path that balances innovation with constitutional protection, and machine capability with human authorship.

Vincent FAUCHOUX
Découvrez l'eBook : Les opérations de Carve-Out en France
Télécharger
Découvrez le Livre Blanc : "Intelligence artificielle : quels enjeux juridiques"
Télécharger
Intelligence Artificielle : quels enjeux juridiques ?

Abonnez vous à notre Newsletter

Recevez chaque mois la lettre du DDG Lab sur l’actualité juridique du moment : retrouvez nos dernières brèves, vidéos, webinars et dossiers spéciaux.
je m'abonne
DDG utilise des cookies dans le but de vous proposer des services fonctionnels, dans le respect de notre politique de confidentialité et notre gestion des cookies (en savoir plus). Si vous acceptez les cookies, cliquer ici.