Cheraw Chronicle

Complete News World

New York Times: Google and OpenAI used YouTube videos to train AI – IT Pro – News

Google and OpenAI used scripted YouTube videos to train proprietary AI language models. The New York Times writes this. The technology companies are said to have violated the video platform's terms of use.

According to the editorial board of the American newspaper, OpenAI was looking for new sources of English text to train its AI language models at the end of 2021. Therefore, the research company developed Whisper: a speech recognition tool that can convert audio from YouTube videos, for example. , to text. The company is said to have transcribed more than a million hours of YouTube videos using this tool. The filtered text from these videos was reportedly used to train the language model behind GPT-4. The New York Times also reported OpenAI has established a team to determine how this action violates YouTube's terms of use. According to the newspaper's editorial board, independent apps that are not affiliated with the video platform are simply not allowed to use videos.

OpenAI was reportedly not the only company using YouTube videos to train AI models. YouTube is said to have implemented this practice itself as well. Five sources say this. It is unclear to what extent YouTube did this, but according to the newspaper, the company certainly violated its own copyright policy. Google is also said to have amended its terms of use in 2023, allowing the company to use public Google Docs files, Google Maps reviews and other online materials to train AI models.

See also  Does the conservative approach pay off?

The New York Times said it was also able to collect information about Meta: the parent company of Facebook, Instagram and WhatsApp. The company is said to have had plans to buy the American publishing house Simon & Schuster and thus acquire the books. This work could then be used to train Meta's AI language models.

Meta also reportedly held meetings in which the possibility of collecting copyrighted data from the Internet was discussed, although this came with an increased risk of lawsuits. The company is said to have considered this course of action because any negotiations with publishers, news outlets and artists would take a lot of time. It is not clear whether Meta has also continued to collect copyrighted information.

The New York Times filed a lawsuit against OpenAI and Microsoft for copyright infringement at the end of 2023. The American newspaper then claimed that the two technology companies had misused “millions” of articles to train their chatbots. OpenAI claimed in February 2024 that The New York Times exploited a flaw in its AI models to apparently infringe the newspaper's copyright.