Risk: OpenAI destroyed databases containing more than 100,000 books used to train ChatGPT

Photo: Getty Images

The conflict between the authors' union and OpenAIowner of ChatGPT, has just begun a new chapter, with documents proving that the startup used thousands of books to train its algorithms.

The consortium is suing the startup, claiming that OpenAI infringed the copyright of published works for AI training.

New evidence suggests that the startup deleted two databases, known as books1 and books2, which contained more than 100,000 published works. According to Business Insider, OpenAI has been reluctant to acknowledge the existence of these files. More recent documents, dated 2020 and now released, reveal that the books1 and books2 databases account for 16% of the total training used to create GPT-3, totaling 50 billion words. OpenAI's lawyers claim that the textbook training was retired at the end of 2021 and the databases were deleted the following year, and that none of the current ChatGPT models were created using these files. Furthermore, those responsible for creating the files are no longer in the company. Using published books is crucial to training high-quality AI models, but the lack of financial compensation for copyright holders has led to legal disputes, including lawsuits brought by the Authors' Union. The startup seeks to keep the contents of databases and the identity of employees confidential.

Charlotte Whitmore

Charlotte Whitmore is a contributor at Mediarunsearch.co.uk, covering a broad range of topics including news, politics, business, technology, sport, entertainment, and lifestyle. She focuses on delivering clear, balanced reporting and practical information that helps readers stay informed about current events and emerging developments. Her work highlights stories that matter to everyday audiences, with an emphasis on accuracy, relevance, and accessible journalism that keeps readers connected to the issues shaping the UK and beyond.

Risk: OpenAI destroyed databases containing more than 100,000 books used to train ChatGPT

South East Water Ordered to Fund £30.5 Million Improvement Programme Following Major Supply Failures

UK Green Economy Surpasses £100bn as Net Zero Sector Drives Jobs and Investment

BYD to cooperate with Senate to deregulate electric vehicles

Risk: OpenAI destroyed databases containing more than 100,000 books used to train ChatGPT

Related Posts

South East Water Ordered to Fund £30.5 Million Improvement Programme Following Major Supply Failures

UK Green Economy Surpasses £100bn as Net Zero Sector Drives Jobs and Investment

BYD to cooperate with Senate to deregulate electric vehicles