Ten years ago, Stephen Hawking told Last Week Tonight’s John Oliver a chilling but memorable fictional story about the potential dangers of AI. The gist is that a group of scientists builds a superintelligent computer and asks it, “Is there a God?” The computer replies, “I have it now,” and lightning strikes the plug, preventing it from shutting down. I hope that doesn’t happen with OpenAI and the evidence lost in the New York Times plagiarism lawsuit.
Wired reports that a court statement filed by The New York Times on Wednesday says that OpenAI engineers accidentally erased evidence of the AI’s training data, which took a long time to examine and edit. That’s what it means. OpenAI recovered some of the data, but the “original filenames and folder structure” indicating when the AI copied the article into the model is still missing.
OpenAI spokesperson Jason Dutrom disagreed with the NYT’s claims and said the company would “submit a response soon.” The Times has been in a battle with Microsoft and OpenAI since December over alleged copyright infringement by the companies’ AI models.
The case is still in the discovery phase, where both sides are requesting and submitting evidence to move the case forward to trial. OpenAI had to hand over training data to the Times, but it has not made public the exact information it used to build its AI modes.
Instead, OpenAI created a “sandbox” of two virtual machines for the NYT’s legal team to investigate. The NYT’s legal team spent more than 150 hours reviewing the data on one machine before it was deleted. OpenAI acknowledged the deletion, but its legal team called it a “glitch.” OpenAI engineers attempted to correct the mistake, but the NYT’s work was missing from the recovered data. This essentially forced the NYT to rebuild everything from scratch. A lawyer for the NYT said there was no reason to believe the deletion was intentional.
