A recent decision by a judge in the Southern District of New York has dealt a blow to OpenAI in its ongoing copyright litigation. U.S. District Judge Sidney H. Stein has ruled that a group of authors can move forward with their claim that OpenAI unlawfully downloaded their copyrighted books to train its artificial intelligence models.
OpenAI had filed a motion to strike what it called a new “download claim,” arguing that the complaint introduced previously unmentioned legal theories. However, Judge Stein rejected this argument, stating that earlier versions of the lawsuit provided sufficient notice of the core allegations: that the company copied and used entire books without permission or compensation. According to the judge, the plaintiffs are not required to articulate a specific legal theory at this stage—what matters are the factual assertions.
“The complaint need not pin plaintiff’s claim for relief to a precise legal theory,” Judge Stein wrote. “Factual allegations alone are what matters.”
While the judge allowed the download-related claim to stand, he did grant OpenAI partial relief by striking specific references to future or unreleased AI models such as GPT-4V, GPT-4.5, GPT-5, and any unnamed successors or derivatives. Stein explained that these references exceeded the scope of his earlier ruling, which limited the case to technologies already in use or previously disclosed.
This legal battle is one of several confronting major AI developers, including Meta and Anthropic, as courts begin to scrutinize the ways in which large language models are trained. Central to this debate is the issue of “fair use”—a legal doctrine that permits limited use of copyrighted material under specific conditions, such as commentary, criticism, or education. However, the application of fair use to AI training remains legally untested and highly contentious.
Authors argue that their books, often available only through purchase or library systems, are being used without consent to develop commercial AI tools that generate text on demand—tools that could potentially compete with or devalue the original works. Some allege that entire books were scraped from online repositories, including pirated sources, and fed into AI systems that can now summarize, paraphrase, or even mimic an author’s voice and style.
The ruling may have far-reaching implications for the AI industry. If courts determine that AI companies must obtain licenses for all copyrighted content used in model training, the costs and logistics of developing these systems could increase dramatically. On the other hand, if courts side with the tech firms and find that such use qualifies as fair use, creators may find themselves with little recourse against the unauthorized use of their work.
Legal scholars are watching these developments closely. The tension between innovation and intellectual property rights could reshape how AI is built and regulated in the coming years. The decision by Judge Stein suggests that courts are not yet ready to dismiss the concerns of content creators and are willing to allow these cases to proceed to discovery and potentially to trial.
Meanwhile, OpenAI maintains that its training practices fall within the bounds of fair use and that the data used to develop models like ChatGPT is sourced from publicly available material. The company asserts that AI models do not store exact copies of texts but rather learn patterns in language, making them fundamentally different from traditional copying.
However, plaintiffs counter that even if the models don’t memorize books verbatim, the use of entire literary works as training input still constitutes reproduction under copyright law. They argue that this form of use is highly exploitative, especially when it involves commercial outputs like subscription-based AI services.
This case is part of a broader legal reckoning in the AI space. Over the past year, several lawsuits have been filed by artists, authors, and media companies against AI developers, accusing them of profiting from unauthorized use of copyrighted material. These include suits over visual art, news articles, and even music.
The outcome of these cases could eventually define the boundaries of lawful AI training, and whether current copyright frameworks are sufficient to address the challenges posed by machine learning technologies. Some experts suggest that legislative action may be needed to clarify the rules, especially as AI continues to evolve rapidly and becomes more deeply integrated into creative and commercial workflows.
In addition to the legal risks, the ethical debate is also intensifying. Critics argue that training AI on copyrighted works without consent undermines the value of human creativity and sets a dangerous precedent. Others worry that if AI models can absorb and replicate an author’s style, it may blur the line between inspiration and imitation, making it even harder to protect intellectual property.
For now, the litigation against OpenAI will proceed, with discovery likely to shed more light on the specific materials used in training and how the company accessed them. The outcome could set a precedent that influences not only how AI companies operate but also how future AI laws are drafted and enforced.

