New Lawsuit Ropes Microsoft Into OpenAI’s Legal Battle With Authors Over Training Data

November 22, 2023

New Lawsuit Ropes Microsoft Into OpenAI’s Legal Battle With Authors Over Training Data

A new lawsuit from an author whose work was used to train OpenAI‘s artificial intelligence model spotlights the company’s partnership with Microsoft to create ChatGPT.

The suit, filed in New York federal court on Tuesday, thrusts the tech giant into the unfolding legal battle for the alleged “rampant theft” of copyrighted material to fuel one of the most promising start-ups in Silicon Valley. OpenAI, with a valuation close to $90 billion, has entrenched Microsoft as a leader in generative AI. The lawsuit underscores Microsoft’s “key role” in providing “critical assistance” in the creation of unlicensed copies of authors’ works to use as training data and the commercialization of GPT-based technology, as well as its knowledge of OpenAI indiscriminately crawling the internet for copyrighted material to train its model.

The filing of the suit follows an unexpected coup over the weekend by Microsoft, which snagged Sam Altman to lead its artificial intelligence research team after he was ousted from OpenAI. While OpenAI has been named in at least four copyright infringement suits, Microsoft has largely avoided scrutiny as it rolls out a fleet of products that are integrated with GPT.

Unlike previous suits led mostly by fiction writers, this one was filed by Julian Sancton, author of narrative nonfiction book Madhouse at the End of the Earth and senior features editor at The Hollywood Reporter, and mostly focuses on nonfiction and academic journals. (THR is not a party to this lawsuit.) He claims that Microsoft has been “deeply involved in the training, development, and commercialization” of OpenAI’s GPT-based products, pointing to the company providing a specialized computing system to train the model, which was necessary given the volume of the dataset.

“Microsoft’s Azure provided the cloud computing systems that powered the training process, and continues to power OpenAI’s operations to this day,” states the complaint. “Without these bespoke computing systems, OpenAI would not have been able to execute and profit from the mass copyright infringement alleged herein.”

According to the complaint, Microsoft chief executive Satya Nadella said in a February interview on CNBC that “beneath what OpenAI is putting out as large language models, remember, the heavy lifting was done by the Azure team to build the compute infrastructure.” The suit argues he was referring to the company’s intimate involvement in developing, maintaining and supporting OpenAI’s supercomputing system. Through that process and its decision to invest $13 billion into the firm, Sancton says that Microsoft should’ve also become aware its partner was engaging in “largescale copyright infringement” in violation of intellectual property laws.

And departing from some other similar cases involving AI firms, the suit alleges that the companies directly made tens of thousands of unlicensed copies of copyrighted works for the purpose of training their AI system.

“While OpenAI was responsible for designing the calibration and fine-tuning of the GPT models—and thus, the largescale copying of this copyrighted material involved in generating a model programmed to accurately mimic Plaintiff’s and others’ styles—Microsoft built and operated the computer system that enabled this unlicensed copying in the first place,” writes Sancton’s lawyer, Craig Smyser of Susman Godfrey, in the complaint.

On dismissal, AI companies have largely maintained that training their systems doesn’t involve wholesale copying of works but rather involves development of parameters — like lines, colors, shades and other attributes associated with subjects and concepts — from those works that collectively define what things look like. U.S. District Judge William Orrick, who’s overseeing a case between artists and AI art generators, based his dismissal of some claims last month based on the reasoning that AI models may not actually contain copies of copyrighted material. The issue remains contested, though authors may have an easier time offering evidence of copying since they can point to ChatGPT responses confirming examples of work that were incorporated into training data. Shortly after its release, the AI tool confirmed, “Yes, Julian Sancton’s book ‘Madhouse at the End of the Earth’ is included in my training data,” according to the complaint, which notes that ChatGPT has been modified to avoid divulging such details.

Microsoft’s involvement wasn’t limited to product development either, the suit claimed, and extended to playing a key role in commercializing GPT-based technology. For example, the company has unveiled Bing Chat, a human-mimicking AI chatbot on its search engine. OpenAI, in turn, integrated ChatGPT with a “Browse with Bing” feature. “Indeed, recent events have further demonstrated the close relationship between OpenAI and Microsoft,” the complaint states. “When OpenAI CEO Sam Altman was terminated, Microsoft hired him.”

In the complaint, Sancton presses arguments that the alleged misconduct was “manifestly unfair use” since users may substitute buying his book with reviewing ChatGPT’s content on his work to learn from his writing style. He also says that OpenAI and Microsoft have deprived authors of potential licensing agreements, citing deals with content creators like the Associated Press and other unidentified parties. If not for the alleged infringement, blanket licensing practices would be possible through an entity like the Copyright Clearance Center, according to the complaint.

The allegations are intended to leverage the Supreme Court’s recent decision in Andy Warhol Foundation for the Visual Arts v. Goldsmith, which effectively reined in the scope of the fair use defense to copyright infringement. In that case, the majority stressed that an analysis of whether an allegedly infringing work was sufficiently transformed must be balanced against the “commercial nature of the use.” This means that fair use is less likely to be found if, for example, AI companies undercut creators’ economic prospects to profit off of their works by scraping material from the internet instead of pursuing licensing deals.

Microsoft didn’t immediately respond to a request for comment. It announced last week that it will defend customers from any “adverse judgments” if they’re sued for copyright infringement stemming from use of Azure OpenAI Service.

In a statement, Sancton said, “It is concerning for anyone who writes for a living to see our work be used without permission or compensation to build large language models that capitalize on our expression for profit.”

On Monday, a federal judge dismissed the bulk of Sarah Silverman’s suit against Meta. U.S. District Judge Vince Chhabria identified “nonsensical” and “not viable” theories surrounding some of her core arguments that the company’s AI model is itself an infringing derivative work and that every result it produces constitutes copyright infringement.

Posted in Uncategorized

BLOG

New Lawsuit Ropes Microsoft Into OpenAI’s Legal Battle With Authors Over Training Data

Leave a Reply Cancel reply