Tech
tech
Rani Molla

Meta’s not telling where it got its AI training data

Today Meta unleashed its ChatGPT competitor, Meta AI, across its apps and as a standalone. The company boasts that it is running on its latest, greatest AI model, Llama 3, which was trained on “data of the highest quality”! A dataset seven times larger than Llama2! And includes 4 times more code!

What is that training data? There the company is less loquacious.

Meta said the 15 trillion tokens on which its trained came from “publicly available sources.” Which sources? Meta told The Verge’s Alex Heath that it didn’t include Meta user data, but didn’t give much more in the way of specifics.

It did mention that it includes AI-generated data, or synthetic data: “we used Llama 2 to generate the training data for the text-quality classifiers that are powering Llama 3.” There are plenty of known issues with synthetic or AI-created data, foremost of which is that it can exacerbate existing issues with AI, because it’s liable to spit out a more concentrated version of any garbage it is ingesting.

AI companies are turning to such data because there’s not enough good, public data on the entire internet to train their increasingly greedy AI models. (Meta had reportedly floated buying a publisher like Simon & Schuster to satisfy its insatiable data needs.)

Meta, of course, isn’t the only company that’s tight-lipped about where its AI data is coming from. In a now infamous interview with WSJ’s Johanna Stern, OpenAI’s chief technology officer Mira Murati was unable to answer questions about what Sora, OpenAI’s video generating app, was trained on. YouTube? Facebook? Instagram — she said she wasn’t sure.

What is that training data? There the company is less loquacious.

Meta said the 15 trillion tokens on which its trained came from “publicly available sources.” Which sources? Meta told The Verge’s Alex Heath that it didn’t include Meta user data, but didn’t give much more in the way of specifics.

It did mention that it includes AI-generated data, or synthetic data: “we used Llama 2 to generate the training data for the text-quality classifiers that are powering Llama 3.” There are plenty of known issues with synthetic or AI-created data, foremost of which is that it can exacerbate existing issues with AI, because it’s liable to spit out a more concentrated version of any garbage it is ingesting.

AI companies are turning to such data because there’s not enough good, public data on the entire internet to train their increasingly greedy AI models. (Meta had reportedly floated buying a publisher like Simon & Schuster to satisfy its insatiable data needs.)

Meta, of course, isn’t the only company that’s tight-lipped about where its AI data is coming from. In a now infamous interview with WSJ’s Johanna Stern, OpenAI’s chief technology officer Mira Murati was unable to answer questions about what Sora, OpenAI’s video generating app, was trained on. YouTube? Facebook? Instagram — she said she wasn’t sure.

More Tech

See all Tech
8%

Some 8% of kids ages 5-12 have interacted with AI chatbots like OpenAI’s ChatGPT or Google’s Gemini, according to a new Pew Research Center survey of their parents. While that’s nowhere near the usage rates of other devices like smartphones or even voice assistants, it’s still notable for a relatively new technology — especially one that’s already had devastating consequences for young people.

tech

Ives says he’s “relatively disappointed” in the price point of lower-cost Tesla models

On Tuesday, Tesla unveiled its long-awaited lower-cost cars, which turned out to be downgraded versions of the existing Model Y and Model 3. Tesla bull and Wedbush Securities analyst Dan Ives wasn’t particularly impressed with the price point, noting that it’s “still relatively high versus other vehicles on the market.”

The Model Y Standard and Model 3 Standard cost about $40,000 and $37,000, respectively. That’s more than the Model Y Premium and Model 3 Premium — what previous editions (or “trim levels”) are now called — cost last month, before the US federal government’s $7,500 tax credit expired. And the Standard models are missing a lot of Premium features, including Autopilot, second-row screens, and Tesla’s iconic glass roofs, among numerous other downgrades.

In other words, Tesla buyers will now be paying more for less, in what amounts to car-sized shrinkflation.

The stock closed down 4.5% yesterday on the news.

Ives doesn’t think it’s the end of the world but is “disappointed” in the price tag:

“We believe the launch of a lower cost model represents the first step to getting back to a ~500k quarterly delivery run-rate which will be important to stimulate demand for its fleet with the EV tax credit expiring at the end of September but we are relatively disappointed with this launch as the price point is only $5k lower than prior Model 3’s and Y’s.”

tech

Nvidia helps boost xAI funding round to $20 billion

xAI’s latest funding round has now doubled to $20 billion from $10 billion a month ago, thanks in part to backing from Nvidia, which invested $2 billion in the equity portion of the transaction, Bloomberg reports. In an interview with CNBC, Nvidia CEO Jensen Huang confirmed the investment, adding that his “only regret” was that he didn’t give xAI more money.

The mix of $7.5 billion in equity and $12.5 billion in debt will finance a special purpose vehicle that will purchase Nvidia chips that xAI will then rent. It’s one of many circular AI deals these days that’s contributing to chatter over an AI bubble by some, while being seen by others as a rational way for industry leaders to boost the potential size of the addressable market and lift their longer-term prospects in the process.

Investors in Elon Musk’s other company, Tesla, will vote next month at the company’s annual shareholder meeting on whether to invest in xAI as well — an outcome Musk has he said supports.

The mix of $7.5 billion in equity and $12.5 billion in debt will finance a special purpose vehicle that will purchase Nvidia chips that xAI will then rent. It’s one of many circular AI deals these days that’s contributing to chatter over an AI bubble by some, while being seen by others as a rational way for industry leaders to boost the potential size of the addressable market and lift their longer-term prospects in the process.

Investors in Elon Musk’s other company, Tesla, will vote next month at the company’s annual shareholder meeting on whether to invest in xAI as well — an outcome Musk has he said supports.

$1T
Jon Keegan

In the past few weeks, OpenAI has announced a flurry of massive deals with Oracle, Nvidia, CoreWeave, AMD, and others as hundreds of billions fly between technology partners racing to expand AI infrastructure at unprecedented scale. The Financial Times tallied it all up and found that the company has signed about $1 trillion worth of deals, and it isn’t clear at all that it will be able to fund them.

The “circular” nature of some of these arrangements is also one factor playing into fears that we’re in an AI bubble.

Latest Stories

Sherwood Media, LLC produces fresh and unique perspectives on topical financial news and is a fully owned subsidiary of Robinhood Markets, Inc., and any views expressed here do not necessarily reflect the views of any other Robinhood affiliate, including Robinhood Markets, Inc., Robinhood Financial LLC, Robinhood Securities, LLC, Robinhood Crypto, LLC, or Robinhood Money, LLC.