On May 9, 2025, the U.S. Copyright Office published a pre-publication draft of the third and final part of its seminal series Copyright and Artificial Intelligence, titled Generative AI Training. This 234-page document represents the most comprehensive legal and policy articulation to date from a U.S. federal authority on the implications of using copyrighted works to train foundation models—systems capable of generating expressive outputs (text, music, images, and more) based on massive datasets often compiled without licenses.
The report was issued by the Register of Copyrights, Shira Perlmutter, and is the result of a lengthy public inquiry initiated in 2023. It integrates findings from over 10,000 submissions, including major AI developers, creators' collectives, publishers, scholars, and members of the public.
Although formally labeled “pre-publication,” the Register has stated that no material changes are expected before final adoption. The report is organized into seven chapters and multiple appendices, reflecting both legal analysis under existing copyright doctrines and legislative policy options for future reform.
At the heart of the report lies the recognition that the copying of copyrighted works during AI training is not speculative—it is a necessary and undisputed feature of how foundation models are developed. Developers copy large volumes of data, often in full, to create statistical representations that inform the model’s ability to generate new content.
The Office acknowledges that such copying likely constitutes prima facie infringement under U.S. copyright law—specifically under 17 U.S.C. §§ 106(1) and (2) concerning the rights of reproduction and preparation of derivative works—unless a valid exception such as fair use applies.
The report further clarifies that infringement does not necessarily require the model to produce outputs identical or substantially similar to the copyrighted inputs; the mere act of copying without authorization, even if not publicly disseminated, may suffice for liability. This assertion aligns with case law such as American Geophysical Union v. Texaco and the foundational MAI Systems decision.
The most detailed portion of the report is devoted to the application of the four statutory fair use factors (17 U.S.C. § 107) to AI training, which the Office approaches with analytical rigor:
The overall conclusion is that, while some narrowly defined non-commercial or research uses may qualify as fair, commercial foundation model training is unlikely to fall within the safe harbor of § 107 absent a robust transformative purpose or public benefit.
A separate section of the report addresses whether the outputs of generative models could themselves infringe copyright. The Office draws a distinction between outputs that are:
The report refers to documented cases where models reproduced entire passages from copyrighted books or entire photographs when prompted. If such outputs are substantially similar to protected inputs, they could constitute derivative works (under § 106(2)) or infringing reproductions. Developers are thus urged to implement anti-memorization safeguards and to be transparent with users about risks of output-based liability.
To address the unsustainable reliance on fair use, the Copyright Office outlines three categories of possible licensing reforms:
The Office also calls for mandatory disclosure requirements on dataset composition, training protocols, and memorization risks. Such regulation could form part of future rulemaking or federal AI legislation.
One day after the report’s release, on May 10, 2025, President Donald Trump abruptly dismissed Shira Perlmutter by email. Multiple news outlets—including Reuters and NPR—reported that the dismissal followed internal disagreements about the Office’s regulatory posture toward generative AI. Perlmutter subsequently filed suit, alleging that only the Librarian of Congress has the legal authority to remove the Register—a claim grounded in the Copyright Office Modernization Act of 2017.
A federal district court declined to issue an injunction reinstating her, but litigation is ongoing. Legal scholars, including Pamela Samuelson and James Grimmelmann, have warned that such politicized removals threaten the independence of copyright administration, especially at a time of global regulatory divergence on AI and intellectual property.
The firing has sparked debate within the U.S. legal community as to whether the Copyright Office should be reconstituted as an independent agency, immune from presidential interference.
The 2025 Copyright Office report is not just a legal analysis—it is a strategic roadmap for the future of generative AI in the United States. It insists, against mounting pressure, that copyright law remains fully applicable to large-scale automated systems, and that creators are entitled to know, consent, and be compensated when their works are used to train machines.
While the political backlash underscores the volatility of this terrain, the report remains a foundational legal reference. For AI developers, legal counsel, policymakers, and content creators, it sets out a path that balances innovation with constitutional protection, and machine capability with human authorship.