Skip to main content
Tech

OpenAI challenges New York Times lawsuit on fair use and copyright

Media Landscape

MediaMiss™This story is a Media Miss by the right as only 3% of the coverage is from right leaning media. Learn more about this data
Left 38% Center 59% Right 3%
Bias Distribution Powered by Ground News

OpenAI is challenging a lawsuit from The New York Times in the court of public opinion. In a blog post, OpenAI said the lawsuit lacks merit, also suggesting that the Times isn’t providing a complete picture of the situation.

The Times claims copyright infringement, asserting that OpenAI and Microsoft used the company’s articles to train its chatbot, ChatGPT.

QR code for SAN app download

Download the SAN app today to stay up-to-date with Unbiased. Straight Facts™.

Point phone camera here

The lawsuit argues that the Times stands to lose customers and revenue if it’s forced to compete with ChatGPT as a news source.

This raises questions that debate if OpenAI is working within the confines of fair use, or if this litigation opens the door to crafting a new framework for such assessments.

“As a copyright lawyer and an academic, this is the first thing that I wanted to know,” Matthew Sag, a professor of law at Emory University specializing in the intersection of intellectual property and generative AI.

Courts have indicated that large language models processing huge amounts of data to generate abstract information on copyrighted material may qualify for fair use under U.S. law.

The way machine learning works is that rather than starting with a theory and then, you know, testing that like a normal statistician, you basically throw an incredible amount of data at a model and the model keeps tweaking itself in successive rounds of training, trying to get better.

Matthew Sag, Emory University

“Generative AI is kind of a slippery term,” Sag said. “I mean, what we’re really talking about is sort of a subset of machine learning programs. And the way machine learning works is that rather than starting with a theory and then, you know, testing that like a normal statistician, you basically throw an incredible amount of data at a model and the model keeps tweaking itself in successive rounds of training, trying to get better.”

This issue is highly debatable, especially concerning potential infringement concerns.

“One of the things that’s really impressive about The New York Times complaint is that they show, like a lot of examples of, ‘Hey, you didn’t just learn abstract things, you kind of seem to have learned how to copy our works exactly.’ And quite frankly, I was shocked at how impressive the evidence was, but that evidence has not been tested,” Sag said.

For example, someone can ask ChatGPT to summarize a specific historical event, and within seconds the generative AI will produce whatever length summary the user requested. The AI can also do things like write songs in the style of a particular artist, raising the issue of the information’s origin and the need for assurance that it is not a direct copy from official sources.

The chatbot produces regurgitations or memorizations, meaning the model might generate text that is similar or identical to phrases, sentences or passages from the data it was trained on. It’s a phenomenon where the model seems to reproduce or memorize specific patterns from its training set rather than generating a novel response.

“The regurgitations The New York Times induced appear to be from years-old articles that have proliferated on multiple third-party websites,” OpenAI said in response to the Times’ lawsuit. “It seems they intentionally manipulated prompts, often including lengthy excerpts of articles, in order to get our model to regurgitate.” 

OpenAI and Microsoft have not submitted formal counterarguments for the New York cases. The companies are required to answer the summons by Jan. 18.

“You know, the NYT complaint, it’s impressive,” Sag said. “And if what they’re showing really is representative of what the GPT-4 is doing, then you know you can, you know they’re hard put to argue that it’s a non expressive use. I’m still skeptical but I think it’s one we have to wait and see how it plays out.”

Tags: , , , , , ,

[LAUREN TAYLOR]

OPEN A-I IS CHALLENGING A LAWSUIT FROM THE NEW YORK TIMES IN THE COURT OF PUBLIC OPINION — IN A BLOG POST THE COMPANY SAID THE LAWSUIT LACKS MERIT WHILE SUGGESTING THE TIMES ISN’T PROVIDING A COMPLETE PICTURE OF THE SITUATION.

THE TIMES CLAIMS COPYRIGHT INFRINGEMENT, ASSERTING THAT OPEN A-I AND MICROSOFT USED THEIR ARTICLES TO TRAIN THEIR CHATBOT, CHAT G-P-T. 

THE LAWSUIT ARGUES THAT THE TIMES STANDS TO LOSE CUSTOMERS AND REVENUE IF IT’S FORCED TO COMPETE WITH CHATGPT AS A NEWS SOURCE.  

BUT WHAT IS THE FULL PICTURE? IS OPEN A-I WORKING WITHIN THE CONFINES OF FAIR USE OR COULD THIS LITIGATION OPEN THE DOOR TO CRAFTING A NEW FRAMEWORK FOR SUCH ASSESSMENTS.

[MATTHEW SAG]

As a copyright lawyer and an academic, this is the first thing that I wanted to know. 

[LAUREN TAYLOR]

THAT’S PROFESSOR MATTHEW SAG, A PROFESSOR OF LAW AT EMORY UNIVERSITY. HE SPECIALIZES IN THE INTERSECTION OF INTELLECTUAL PROPERTY AND GENERATIVE A-I.

[MATTHEW SAG]

Generative A.I. is kind of a slippery term. I mean, what we’re really talking about is sort of a subset of machine learning programs. And the way machine learning works is that rather than starting with a theory and then you know testing that like a normal statistician, you basically throw an incredible amount of data at a model and the model keeps tweaking itself in successive rounds of training, trying to get better. 

[LAUREN TAYLOR]

COURTS HAVE INDICATED THAT LARGE LANGUAGE MODELS, WHICH PROCESS huge AMOUNTS OF DATA TO GENERATE ABSTRACT INFORMATION ABOUT COPYRIGHTED MATERIAL, MAY QUALIFY FOR FAIR USE UNDER U.S. LAW

HOWEVER THIS IS A NUANCED AND DEBATABLE ISSUE – PARTICULARLY CONCERNING POTENTIAL INFRINGEMENT CONCERNS.

[MATTHEW SAG]

One of the things that’s really impressive about The New York Times complaint is that they show, like a lot of examples of, hey, you didn’t just learn abstract things…you kind of seem to have learned how to copy our works exactly. And quite frankly, I was shocked at how impressive the evidence was, but that evidence has not been tested. 

[LAUREN TAYLOR]

FOR EXAMPLE, YOU CAN HEAD TO CHAT GPT AND ASK IT TO SUMMARIZE SHAYS’ REBELLION, A FARMER LED UPRISING IN MASSACHUSETTS IN 18-86. THE GENERATIVE A-I SPITS OUT 150 WORDS SUMMARIZING THE MOVEMENT WITHIN SECONDS. 

OR YOU COULD PROMPT IT TO WRITE A SONG IN THE STYLE OF YOUR FAVORITE ARTIST ABOUT DRINKING A CUP OF COFFEE ON A SNOWY DAY.

THAT BRINGS THE QUESTION, WHERE DID THAT INFORMATION COME FROM AND HOW CAN YOU BE SURE IT ISN’T A DIRECT COPY FROM OFFICIAL SOURCES? 

THESE ARE CALLED REGURGITATIONS OR MEMORIZATIONS. THIS MEANS THE MODEL MIGHT GENERATE TEXT THAT IS SIMILAR OR IDENTICAL TO PHRASES, SENTENCES, OR PASSAGES FROM THE DATA IT WAS TRAINED ON. IT’S A PHENOMENON WHERE THE MODEL APPEARS TO REPRODUCE OR MEMORIZE SPECIFIC PATTERNS FROM ITS TRAINING SET RATHER THAN GENERATING NOVEL OR CONTEXTUALLY APPROPRIATE RESPONSES.

IN RESPONSE TO THE TIMES’ LAWSUIT OPEN A-I SAYS THAT QUOTE:

The regurgitations The New York Times induced appear to be from years-old articles that have proliferated on multiple third-party websites. It seems they intentionally manipulated prompts, often including lengthy excerpts of articles, in order to get our model to regurgitate. 

[MATTHEW SAG]

You know, the NYT complaint it’s impressive. And  if what they’re showing really is representative of what the GPT four is doing, then you know you can, you know they’re hard put to argue that it’s a non expressive use. I’m still skeptical but I think it’s one we have to wait and see how it plays out.

[LAUREN TAYLOR]

OPENAI AND MICROSOFT HAVE NOT SUBMITTED FORMAL COUNTER-ARGUMENTS IN THE NEW YORK CASES. THEY ARE REQUIRED TO ANSWER SUMMONS BY JANUARY 18.