A matrix of books whose covers have been torn off

Bartz et al. v. Anthropic PBC is an ongoing case in the Northern District of California that is of interest because it contains a small win for creators, at least in the context of this case. In another Anthropic case from the same district, a different judge dealt a more definitive blow to plaintiffs suing Anthropic for copyright infringement, vicarious infringement, and intentional removal of copyright management information. 

Here, a proposed class action was brought by five authors against Anthropic for copyright infringement over its use of their works in the training of large language models. Anthropic responded by making a motion for summary judgment with fair use as its defense, calling its actions transformative.

Anthropic’s alleged infringing behavior occurred in its attempt to procure “all the books in the world” to train its main product, Claude. In its early days, Anthropic admittedly relied on versions of books that were known to be pirated, with one of its sources having the word “pirate” in its name. While that set of acquired books is not being used for training purposes, it still exists. 

At some point, probably when it got better counsel (or simply listened to counsel…), Anthropic had a change of heart and began to acquire books legally. The “first sale doctrine” means someone who purchases a book can do anything they want with that book aside from regurgitating it. With this rationale, millions of used books were then purchased by Anthropic, scanned in, and destroyed.

In its fair use analysis, the court said even if the nature of the works is expressive, and even if it enables AI to write better because it was trained on better examples of writing from the plaintiffs, it was not replacing the market for those books. Instead, it was using them for a different purpose: to teach, even if to teach Claude, how to be a better writer. At one point the court said such behavior was no more infringing than teaching school children how to write better would be infringing.

Further into its fair use analysis, the court addressed whether the amount of each work used by Anthropic was appropriate, as its practice was to scan entire books. The court said that the entire book was actually the appropriate amount to copy, and that it did not impact the authors in any way. The court granted the motion to dismiss concerning the works Anthropic had purchased, copied, and threw away.

As for the set of pirated works that Anthropic collected early in its existence, the court reserved judgment and invited the parties to come back to determine what the damages might be after this ruling was taken into account. 

The fact that Anthropic switched from pirated works to purchased works as training material might mean heightened statutory damages because it showed a willingness to commit intentional infringement as the status quo ante. It also showed a knowledge of what it was doing was wrong. Additionally, the court noted that a representative from Anthropic reached out to two publishing houses about working together but let those connections “whither.”

This seems analogous to the early days of sampling music. As a saxophone player, I was once approached by a producer client who wanted to record me playing samples of each note in the horn’s range. The purpose was to better enable him to use the samples to make music with actual saxophone sounds, without a horn player. I declined because even though the recordings of my playing could be used to make new music, reusing a sample of each note would eliminate the need for any live musician to make the new music. Such use probably would be deemed transformative, and so wouldn’t infringe on anyone’s copyright — but do we really need or want to eliminate human beings from creating original content?

About the Author

Kaufman & Kahn kaufman@kaufmankahn.com 10 Grand Central, 155 East 44th Street, 19th Floor New York, NY 10017 Tel. (212) 293-5556 Fax. (212) 355-5009