AI companies are sucking in YouTube subtitles for training
An investigation by Proof News found that major tech companies, including Anthropic, Nvidia, Apple, and Salesforce have been using subtitles from YouTube videos to train their AI models. The training dataset consisted of the subtitles from 173,536 videos from 48,000 channels included content from creators like MrBeast, PewDiePie, TED, and Khan Academy, among others. Those creators didn’t necessarily give permission or get paid. Earlier this year, the New York Times found that OpenAI, which has consistently avoided fessing up, also used YouTube data to train its AI.
