Anthropic hits back at music publishers in AI copyright lawsuit

Anthropic, a major generative AI startup, laid out its case why accusations of copyright infringement from a group of music publishers and content owners are invalid in a new court filing on Wednesday.

In fall 2023, music publishers including Concord, Universal, and ABKCO filed a lawsuit against Anthropic accusing it of copyright infringement over its chatbot Claude (now supplanted by Claude 2).

The complaint, filed in federal court in Tennessee (home of Nashville, one of America’s “Music Cities” and to many labels and musicians), alleges that Anthropic’s business profits from “unlawfully” scraping song lyrics from the internet to train its AI models, which then reproduce the copyrighted lyrics for users in the form of chatbot responses.

Responding to a motion for preliminary injunction — a measure that, if granted by the court, would force Anthropic to stop making its Claude AI model available — Anthropic laid out familiar arguments that have emerged in numerous other copyright disputes involving AI training data.

Gen AI companies like OpenAI and Anthropic rely heavily on scraping massive amounts of publicly available data, including copyrighted works, to train their models but they maintain this use constitutes fair use under the law. It’s expected the question of data scraping copyright will reach the Supreme Court.

Song lyrics only a ‘miniscule fracion’ of training data

In its response, Anthropic argues its “use of Plaintiffs’ lyrics to train Claude is a transformative use” that adds “a further purpose or different character” to the original works.

To support this, the filing directly quotes Anthropic research director Jared Kaplan, stating the purpose is to “create a dataset to teach a neural network how human language works.”

Anthropic contends its conduct “has no ‘substantially adverse impact’ on a legitimate market for Plaintiffs’ copyrighted works,” noting song lyrics make up “a minuscule fraction” of training data and licensing the scale required is incompatible.

Joining OpenAI, Anthropic claims licensing the vast troves of text needed to properly train neural networks like Claude is technically and financially infeasible. Training demands trillions of snippets across genres may be an unachievable licensing scale for any party.

Perhaps the filing’s most novel argument claims the plaintiffs themselves, not Anthropic, engaged in the “volitional conduct” required for direct infringement liability regarding outputs.

“Volitional conduct” in copyright law refers to the idea that a person accused of committing infringement must be shown to have control over the infringing content outputs. In this case, Anthropic is essentially saying that the label plaintiffs caused its AI model Claude to produce the infringing content, and thus, are in control of and responsible for the infringement they report, as opposed to Anthropic or its Claude product, which reacts to inputs of users autonomously.

The filing points to evidence the outputs were generated through the plaintiffs’ own “attacks” on Claude designed to elicit lyrics.

Irreparable harm?

On top of contesting copyright liability, Anthropic maintains the plaintiffs cannot prove irreparable harm.

Citing a lack of evidence that song licensing revenues have decreased since Claude launched or that qualitative harms are “certain and immediate,” Anthropic pointed out that the publishers themselves believe monetary damages could make them whole, contradicting their own claims of “irreparable harm” (as, by definition, accepting monetary damages would indicate the harms do have a price that could be quantified and paid).

Anthropic asserts the “extraordinary relief” of an injunction against it and its AI models is unjustified given the plaintiffs’ weak showing of irreparable harm. It also argued that any output of lyrics by Claude was an unintentional “bug” that has now been fixed through new technological guardrails.

Specifically, Anthropic claims it has implemented additional safeguards in Claude to prevent any further display of the plaintiffs’ copyrighted song lyrics. Because the alleged infringing conduct cannot reasonably occur again, the model maker says the plaintiffs’ request for relief preventing Claude from outputting lyrics is moot.

It contends the music publishers’ request is overbroad, seeking to restrain use not just of the 500 representative works in the case, but millions of others that the publishers further claim to control.

As well, the AI start up pointed to the Tennessee venue and claimed the lawsuit was filed in the incorrect jurisdiction. Anthropic maintained that it has no relevant business connections to Tennessee. The company noted that its headquarters and principal operations are based in California.

Further, Anthropic stated that none of the allegedly infringing conduct cited in the suit, such as training its AI technology or providing user responses, took place within Tennessee’s borders.

The filing pointed out users of Anthropic’s products agreed any disputes would be litigated in California courts.

Copyright fight far from over

The copyright battle in the burgeoning generative AI industry continues to intensify.

More artists joined lawsuits against art generators like Midjourney and OpenAI with the latter’s DALL-E model, bolstering evidence of infringement from diffusion model reconstructions.

The New York Times recently filed a copyright infringement lawsuit against OpenAI and Microsoft, alleging that their use of scraped Times’ content to train models for ChatGPT and other AI systems violated its copyrights. The suit calls for billions in damages and demands that any models or data trained on Times content be destroyed.

Amid these debates, a nonprofit group called “Fairly Trained” launched this week advocating for a “licensed model” certification for data used to train AI models — supported by Concord and Universal Music Group, among others.

Platforms have also stepped in, with Anthropic, Google and OpenAI as well as content companies like Shutterstock and Adobe pledging legal defenses for enterprise users of AI generated content.

Creators are undaunted though, fighting bids to dismiss claims from authors like Sarah Silverman’s against OpenAI. Judges will need to weigh technological progress and statutory rights in nuanced disputes.

Furthermore, regulators are listening to worries over datamining scopes. Lawsuits and congressional hearings may decide whether fair use shelters proprietary appropriations, frustrating some while enabling others. Overall, negotiations seem inevitable to satisfy all involved as generative AI matures.

What comes next remains unclear, but this week’s filing suggests generative AI companies are coalescing around a core set of fair use and harm-based defenses, forcing courts to weigh technological progress against rights owners’ control.

As VentureBeat reported previously, no copyright plaintiffs so far have won a preliminary injunction in these types of AI disputes. Anthropic’s arguments aim to ensure this precedent will persist, at least through this stage in one of many ongoing legal battles. The endgame remains to be seen.

VentureBeat’s mission is to be a digital town square for technical decision-makers to gain knowledge about transformative enterprise technology and transact. Discover our Briefings.

Source link

Song lyrics only a ‘miniscule fracion’ of training data

Irreparable harm?

Copyright fight far from over

Related News

You may have missed