Tech
tech

Meta scrambling to defend its AI after Llama 4 benchmark bungle

This weekend, Meta surprised everyone and released two flavors (“Maverick” medium and “Scout” small) of its highly anticipated Llama 4 AI model. Llama 4’s release is a big deal, as the company has been hyping it up as the key to its AI plans in the coming year.

When a major new model drops, people do two things: check to see how the model scored on major benchmarks, and load up the model and kick the tires.

Llama 4’s benchmark scored some eye-popping results for ChatbotArea, a popular human-powered benchmark that’s a sort of blind taste test for AI models with side-by-side results. But after looking at the fine print, some in the community cried foul, as Meta achieved the higher score using an “experimental chat version” of Llama 4 that was not available to the public.

A footnote to a chart that highlighted Llama 4’s standout score read “LMArena testing was conducted using Llama 4 Maverick optimized for conversationality.”

In response to the controversy, LMArena (which runs the Chatbot Arena benchmark) updated its guidelines for testing:

“Meta’s interpretation of our policy did not match what we expect from model providers. Meta should have made it clearer that ‘Llama-4-Maverick-03-26-Experimental’ was a customized model to optimize for human preference. As a result of that we are updating our leaderboard policies to reinforce our commitment to fair, reproducible evaluations so this confusion doesn’t occur in the future.”

This led to some unfounded accusations that Meta had trained its model on test datasets — akin to giving a kid the answers to a quiz before having them take the test.

To quell the firestorm of questions surrounding the model’s release, Meta’s head of generative AI, Ahmad Al-Dahle, refuted the claims in a post on X yesterday.

The release was also unusual for what was missing from the release: the extra-large version of the model named “Behemoth.” Meta said the model was still being trained, but boasted about its performance nonetheless.

“Llama 4 Behemoth outperforms GPT-4.5, Claude Sonnet 3.7, and Gemini 2.0 Pro on several STEM benchmarks. Llama 4 Behemoth is still training, and we’re excited to share more details about it even while it’s still in flight.”

Meta did not immediately respond to a request for comment.

More Tech

See all Tech
tech

Meta buys chip startup Rivos in effort to lower its reliance on Nvidia

Meta is buying AI chip startup Rivos for an unknown sum, as part of the social media companys effort to decrease its reliance on graphics processing units from Nvidia, Bloomberg reports. Rivos was seeking funding in August at a $2 billion valuation. Meta has been spending exorbitant sums in an attempt to create AI models that are smarter than humans, an effort that’s involved investing in developing its own AI chips.

⚡️ +267% ⚡️

A new analysis by Bloomberg looked at wholesale electricity prices and found that in the past five years, areas near data centers saw their prices spike as much as 267%. More than 70% of the price increases took place in areas less than 50 miles from a data center.

As tech companies race to build colossal data centers, unprecedented energy demands from the projects are passing some of the costs on to consumers.

tech

OpenAI’s first-half 2025 sales were 16% higher than all of 2024

OpenAI brought in $4.3 billion in revenue in the first half of this year, 16% higher than its total revenue in 2024, The Information reports, citing financial disclosures to shareholders. The ChatGPT maker also burned through $2.5 billion in the same time frame.

Currently the company is generating more than $1 billion in revenue each month, which puts it on track to reach its full-year projection for $13 billion in revenue and $8.5 billion in cash burn — a paltry sum compared to the $115 billion it expects to burn through 2029.

Latest Stories

Sherwood Media, LLC produces fresh and unique perspectives on topical financial news and is a fully owned subsidiary of Robinhood Markets, Inc., and any views expressed here do not necessarily reflect the views of any other Robinhood affiliate, including Robinhood Markets, Inc., Robinhood Financial LLC, Robinhood Securities, LLC, Robinhood Crypto, LLC, or Robinhood Money, LLC.