Tech
tech
Jon Keegan

Meta used a pirated library of millions of books and papers to train its AI because they thought everybody was doing it

In January, we learned from internal Meta communications revealed in a copyright lawsuit that the company downloaded LibGen, a massive collection of pirated, copyrighted works including millions of books and academic papers, to train its Llama AI model. This legally dubious move was approved by “MZ.”

More details are emerging surrounding this consequential decision as the lawsuit plays out. New court filings detail the internal deliberations within Meta involving researchers who knew using pirated works was a big no-no, but they did it anyway, as they suspected their competitors were using the archive, too. Meta employees wrote:

“everyone is using lib-gen (startups, but also google, openAI)”

“And I’m pretty sure other folks have no issues taking all of libgen 😊”

The Atlantic took a deeper look at what exactly is in this dataset. Using a “snapshot” of the archive (just a list of what is in there, not the works themselves), they created a search tool you can use to find exactly what works were in the archive. Authors who found that their works were in the dataset have taken to social media to express their outrage.

More details are emerging surrounding this consequential decision as the lawsuit plays out. New court filings detail the internal deliberations within Meta involving researchers who knew using pirated works was a big no-no, but they did it anyway, as they suspected their competitors were using the archive, too. Meta employees wrote:

“everyone is using lib-gen (startups, but also google, openAI)”

“And I’m pretty sure other folks have no issues taking all of libgen 😊”

The Atlantic took a deeper look at what exactly is in this dataset. Using a “snapshot” of the archive (just a list of what is in there, not the works themselves), they created a search tool you can use to find exactly what works were in the archive. Authors who found that their works were in the dataset have taken to social media to express their outrage.

More Tech

See all Tech
tech

Amazon cuts another 16,000 roles after laying off 14,000 workers in October

Amazon announced Wednesday that its cutting 16,000 roles across the company, having laid off 14,000 workers only three months ago.

“As I shared in October, weve been working to strengthen our organization by reducing layers, increasing ownership, and removing bureaucracy,” Senior Vice President of People Experience and Technology Beth Galetti wrote in the press release. “While many teams finalized their organizational changes in October, other teams did not complete that work until now.”

CEO Andy Jassy previously said that the October layoffs were “about culture” rather than AI-related cost cutting. Galetti says layoffs, now totaling 30,000, won’t become a regular occurrence.

“Some of you might ask if this is the beginning of a new rhythm — where we announce broad reductions every few months. That’s not our plan.”

CEO Andy Jassy previously said that the October layoffs were “about culture” rather than AI-related cost cutting. Galetti says layoffs, now totaling 30,000, won’t become a regular occurrence.

“Some of you might ask if this is the beginning of a new rhythm — where we announce broad reductions every few months. That’s not our plan.”

Latest Stories

Sherwood Media, LLC produces fresh and unique perspectives on topical financial news and is a fully owned subsidiary of Robinhood Markets, Inc., and any views expressed here do not necessarily reflect the views of any other Robinhood affiliate, including Robinhood Markets, Inc., Robinhood Financial LLC, Robinhood Securities, LLC, Robinhood Crypto, LLC, or Robinhood Money, LLC.