New York Times: Google and OpenAI used YouTube videos to train AI - IT Pro

Google and OpenAI used scripted YouTube videos to train proprietary AI language models. The New York Times writes this. The technology companies are said to have violated the video platform's terms of use.

According to the editorial board of the American newspaper, OpenAI was looking for new sources of English text to train its AI language models at the end of 2021. Therefore, the research company developed Whisper: a speech recognition tool that can convert audio from YouTube videos, for example. , to text. The company is said to have transcribed more than a million hours of YouTube videos using this tool. The filtered text from these videos was reportedly used to train the language model behind GPT-4. The New York Times also reported OpenAI has established a team to determine how this action violates YouTube's terms of use. According to the newspaper's editorial board, independent apps that are not affiliated with the video platform are simply not allowed to use videos.

OpenAI was reportedly not the only company using YouTube videos to train AI models. YouTube is said to have implemented this practice itself as well. Five sources say this. It is unclear to what extent YouTube did this, but according to the newspaper, the company certainly violated its own copyright policy. Google is also said to have amended its terms of use in 2023, allowing the company to use public Google Docs files, Google Maps reviews and other online materials to train AI models.

The New York Times said it was also able to collect information about Meta: the parent company of Facebook, Instagram and WhatsApp. The company is said to have had plans to buy the American publishing house Simon & Schuster and thus acquire the books. This work could then be used to train Meta's AI language models.

Meta also reportedly held meetings in which the possibility of collecting copyrighted data from the Internet was discussed, although this came with an increased risk of lawsuits. The company is said to have considered this course of action because any negotiations with publishers, news outlets and artists would take a lot of time. It is not clear whether Meta has also continued to collect copyrighted information.

The New York Times filed a lawsuit against OpenAI and Microsoft for copyright infringement at the end of 2023. The American newspaper then claimed that the two technology companies had misused “millions” of articles to train their chatbots. OpenAI claimed in February 2024 that The New York Times exploited a flaw in its AI models to apparently infringe the newspaper's copyright.

Maya Angelou

Maya Angelou is a contributor to The Cheraw Chronicle, covering a broad range of topics including news, politics, business, technology, sports, entertainment, and lifestyle. She focuses on delivering clear reporting, useful information, and timely coverage of current events that matter to readers. Her work aims to provide balanced perspectives and accessible insights, helping audiences stay informed about developments in their communities and beyond through accurate, reader-focused storytelling.

New York Times: Google and OpenAI used YouTube videos to train AI – IT Pro – News

Auto Industry Warns of Motor Oil Shortages as Middle East Conflict Disrupts Supply Chains

YouTube Premium Price Increase Stands Apart in Crowded Streaming Market

U.S. Economy Showed Signs of Strain Even Before Iran Conflict, New Data Suggests

High School Baseball to Allow Lighter Bats Under New NFHS Rules Beginning in 2028

Google Photos Launches AI-Powered Video Remix Tool for Fast Video Editing

Walmart, Kroger, and Amazon Grocery Price Comparison: Which Retailer Offered the Best Value?

Venus-Jupiter Conjunction 2026 Dazzles Skywatchers Around the World

Leave a Reply Cancel reply

More Stories