
"OpenAI may soon be forced to explain why it deleted a pair of controversial datasets composed of pirated books, and the stakes could not be higher. At the heart of a class-action lawsuit from authors alleging that ChatGPT was illegally trained on their works, OpenAI's decision to delete the datasets could end up being a deciding factor that gives the authors the win."
"It's undisputed that OpenAI deleted the datasets, known as "Books 1" and "Books 2," prior to ChatGPT's release in 2022. Created by former OpenAI employees in 2021, the datasets were built by scraping the open web and seizing the bulk of its data from a shadow library called Library Genesis (LibGen). As OpenAI tells it, the datasets fell out of use within that same year, prompting an internal decision to delete them. But the authors suspect there's more to the story than that."
"In fact, OpenAI's reversal only made authors more eager to see how OpenAI discussed "non-use," and now they may get to find out all the reasons why OpenAI deleted the datasets. Last week, US district judge Ona Wang ordered OpenAI to share all communications with in-house lawyers about deleting the datasets, as well as "all internal references to LibGen that OpenAI has redacted or withheld on the basis of attorney-client privilege.""
A class-action lawsuit alleges ChatGPT was illegally trained on authors' works. OpenAI deleted two datasets called "Books 1" and "Books 2" before ChatGPT's 2022 release. The datasets were built in 2021 by former OpenAI employees using web scraping and data from Library Genesis (LibGen). OpenAI initially said the datasets fell out of use and were deleted for that reason. Authors assert OpenAI later retracted that explanation and invoked attorney-client privilege to withhold deletion reasons. A judge ordered disclosure of communications with in-house lawyers and internal references to LibGen. The deleted datasets and communications may be decisive evidence in the copyright claims.
Read at Ars Technica
Unable to calculate read time
Collection
[
|
...
]