Tech
tech
Rani Molla

Meta’s not telling where it got its AI training data

Today Meta unleashed its ChatGPT competitor, Meta AI, across its apps and as a standalone. The company boasts that it is running on its latest, greatest AI model, Llama 3, which was trained on “data of the highest quality”! A dataset seven times larger than Llama2! And includes 4 times more code!

What is that training data? There the company is less loquacious.

Meta said the 15 trillion tokens on which its trained came from “publicly available sources.” Which sources? Meta told The Verge’s Alex Heath that it didn’t include Meta user data, but didn’t give much more in the way of specifics.

It did mention that it includes AI-generated data, or synthetic data: “we used Llama 2 to generate the training data for the text-quality classifiers that are powering Llama 3.” There are plenty of known issues with synthetic or AI-created data, foremost of which is that it can exacerbate existing issues with AI, because it’s liable to spit out a more concentrated version of any garbage it is ingesting.

AI companies are turning to such data because there’s not enough good, public data on the entire internet to train their increasingly greedy AI models. (Meta had reportedly floated buying a publisher like Simon & Schuster to satisfy its insatiable data needs.)

Meta, of course, isn’t the only company that’s tight-lipped about where its AI data is coming from. In a now infamous interview with WSJ’s Johanna Stern, OpenAI’s chief technology officer Mira Murati was unable to answer questions about what Sora, OpenAI’s video generating app, was trained on. YouTube? Facebook? Instagram — she said she wasn’t sure.

What is that training data? There the company is less loquacious.

Meta said the 15 trillion tokens on which its trained came from “publicly available sources.” Which sources? Meta told The Verge’s Alex Heath that it didn’t include Meta user data, but didn’t give much more in the way of specifics.

It did mention that it includes AI-generated data, or synthetic data: “we used Llama 2 to generate the training data for the text-quality classifiers that are powering Llama 3.” There are plenty of known issues with synthetic or AI-created data, foremost of which is that it can exacerbate existing issues with AI, because it’s liable to spit out a more concentrated version of any garbage it is ingesting.

AI companies are turning to such data because there’s not enough good, public data on the entire internet to train their increasingly greedy AI models. (Meta had reportedly floated buying a publisher like Simon & Schuster to satisfy its insatiable data needs.)

Meta, of course, isn’t the only company that’s tight-lipped about where its AI data is coming from. In a now infamous interview with WSJ’s Johanna Stern, OpenAI’s chief technology officer Mira Murati was unable to answer questions about what Sora, OpenAI’s video generating app, was trained on. YouTube? Facebook? Instagram — she said she wasn’t sure.

More Tech

See all Tech
tech

Report: OpenAI’s Stargate has been a chaotic mess

Just over a year ago, OpenAI CEO Sam Altman stood alongside President Trump, Oracle’s Larry Ellison, and SoftBank CEO Masayoshi Son to announce an ambitious $500 billion plan to build massive data centers in the US — Project Stargate.

While today an actual Stargate 1-gigawatt data center is certainly well under construction in Abilene, Texas, it turns out there wasn’t much of a plan in place at the time of the announcement, according to a new report from The Information.

The past year has been full of partner disputes, debt problems, and scuttled plans as the loosely defined project races to build the AI computing infrastructure that OpenAI is craving as competition heats up.

Per the report, OpenAI tried to build its own data centers as the project stalled, but lenders balked at funding the risky project. They eventually settled on the current plan, in which partner Oracle borrows the money and leases capacity back to OpenAI. OpenAI was still able to control the design of the facility.

The slow start for the project resulted in OpenAI missing its own goal of 10 gigawatts of AI computing capacity from Oracle and SoftBank by the end of 2025.

The past year has been full of partner disputes, debt problems, and scuttled plans as the loosely defined project races to build the AI computing infrastructure that OpenAI is craving as competition heats up.

Per the report, OpenAI tried to build its own data centers as the project stalled, but lenders balked at funding the risky project. They eventually settled on the current plan, in which partner Oracle borrows the money and leases capacity back to OpenAI. OpenAI was still able to control the design of the facility.

The slow start for the project resulted in OpenAI missing its own goal of 10 gigawatts of AI computing capacity from Oracle and SoftBank by the end of 2025.

tech

Ives says AI represents huge opportunity for cybersecurity firms as losses mount

Cybersecurity stocks continued to slide Monday, after Anthropic unveiled a new security feature for its AI model Friday. The company’s AI advancements have been wreaking havoc across software firms, and its latest foray appears to be doing the same to cybersecurity leaders, including CrowdStrike, Zscaler, and Cloudflare.

But similar to Dan Ives’ broader thesis on the software sell-off — which he has called “overblown,” arguing that the companies getting hit may ultimately become “core participants in the AI Revolution” — the Wedbush Securities analyst says AI is actually a positive for cybersecurity stocks.

“Anthropic going after this market with an initial tool validates our thesis that cyber security is the next frontier for the AI Revolution,” Ives wrote Monday morning, arguing that AI is elevating the risk environment — and the need for cybersecurity firms in the first place.

“AI will be a major tailwind to the cyber security sector over the coming years as protection of use cases, data, and endpoints expand markedly,” he said, adding that companies including CrowdStrike and Zscaler are well positioned to capitalize on the shift by incorporating AI into their strategies.

tech

Analysts slash Salesforce price targets ahead of Wednesday earnings, as narrative of AI eating its lunch persists

A number of analysts have significantly lowered their price targets for Salesforce, citing growing fears that AI workforce tools, including Anthropic’s Cowork, could threaten parts of its core business. According to reports, here are some of the recent cuts:

  • Morgan Stanley cut its price target nearly 30%, to $287 from $398.

  • Jefferies slashed its forecast 33%, to $250 from $375.

  • Barclays reduced its price target to $265 from $338.

  • Evercore ISI went to $260 from $340.

  • Last week, Citigroup also reduced its price target to $197 from $257.

Earlier this month, Wedbush Securities analyst Dan Ives offered a different view, adding Salesforce to his list of top 30 AI companies and calling the stock a “core participant” in the “AI revolution.” He described the recent software sell-off as “overblown.”

Shares of Salesforce, which reports earnings Wednesday, are down 30% year to date and 1% premarket today.

tech

Report: Amazon’s AI bots have been behind multiple AWS outages

Amazon’s AI tool Kiro, which launched in July and can code autonomously, was behind a 13-hour interruption to Amazon Web Services in December, according to reporting by the Financial Times.

The FT reports that the company’s AI tools have caused AWS service disruptions at least twice in recent months.

In the December outage, which Amazon called an “extremely limited event” that did not have an impact on customer-facing service, engineers allowed Kiro to make changes and the tool opted to “delete and recreate the environment.”

Amazon has a closely tracked internal target that 80% of its developers use AI to code once a week, employees told the FT. The company says the December incident was a “user access control issue” and not an issue with Kiro’s permissions.

AWS accounted for 57% of Amazon’s operating profit in 2025. In December, following a larger outage months earlier, AWS and Google announced a partnership to attempt to prevent massive network outages.

Update, February 20, 5:50 p.m. ET: In a statement to Sherwood News, an AWS spokesperson disputed the report, writing:

“These brief events were the result of user error — specifically misconfigured access controls — not AI. The December service interruption was an extremely limited event when a single service (AWS Cost Explorer — which helps customers visualize, understand, and manage AWS costs and usage over time) in one of our two Regions in Mainland China was affected. This event didnt impact compute, storage, database, AI technologies, or any other of the hundreds of services that we run. We are not aware of any related customer inquiries resulting from this isolated interruption. Following these events, we implemented numerous additional safeguards, including mandatory peer review for production access, enhanced training on AI-assisted troubleshooting, and resource protection measures. Kiro puts developers in control — users need to configure which actions Kiro can take, and by default, Kiro requests authorization before taking any action.”

In the December outage, which Amazon called an “extremely limited event” that did not have an impact on customer-facing service, engineers allowed Kiro to make changes and the tool opted to “delete and recreate the environment.”

Amazon has a closely tracked internal target that 80% of its developers use AI to code once a week, employees told the FT. The company says the December incident was a “user access control issue” and not an issue with Kiro’s permissions.

AWS accounted for 57% of Amazon’s operating profit in 2025. In December, following a larger outage months earlier, AWS and Google announced a partnership to attempt to prevent massive network outages.

Update, February 20, 5:50 p.m. ET: In a statement to Sherwood News, an AWS spokesperson disputed the report, writing:

“These brief events were the result of user error — specifically misconfigured access controls — not AI. The December service interruption was an extremely limited event when a single service (AWS Cost Explorer — which helps customers visualize, understand, and manage AWS costs and usage over time) in one of our two Regions in Mainland China was affected. This event didnt impact compute, storage, database, AI technologies, or any other of the hundreds of services that we run. We are not aware of any related customer inquiries resulting from this isolated interruption. Following these events, we implemented numerous additional safeguards, including mandatory peer review for production access, enhanced training on AI-assisted troubleshooting, and resource protection measures. Kiro puts developers in control — users need to configure which actions Kiro can take, and by default, Kiro requests authorization before taking any action.”

$830B

OpenAI is finalizing commitments on a funding round that could climb beyond $100 billion at a valuation of $830 billion, according to a report from The Information.

Per The Information, SoftBank is expected to invest $30 billion into the ChatGPT maker, spread across the year in three installments of $10 billion. Up to $50 billion could come from Amazon and $30 billion from Nvidia (up from the $20 billion Bloomberg reported earlier this month). An additional investment in the low billions could come from Microsoft.

OpenAI was last valued at $500 billion following a fundraising round completed in October. Earlier this month, its rival Anthropic took in $30 billion from investors including Microsoft and Nvidia at a $380 billion valuation.

Latest Stories

Sherwood Media, LLC produces fresh and unique perspectives on topical financial news and is a fully owned subsidiary of Robinhood Markets, Inc., and any views expressed here do not necessarily reflect the views of any other Robinhood affiliate, including Robinhood Markets, Inc., Robinhood Financial LLC, Robinhood Securities, LLC, Robinhood Crypto, LLC, or Robinhood Money, LLC.