How do AI models stack up vs. humans on standardized benchmarks?

AI is getting good at a lot of different tasks

4/17/24 10:46AM

The state of AI

As AI has gone mainstream, companies haven’t been shy in deploying the technology — applying it to manual and repetitive tasks and even citing it as a reason for mass layoffs.

But how does AI compare to humans on technical tasks? A new report, Stanford University’s 2024 AI Index, summarizes where the burgeoning technology is at.

The headline is that recent breakthroughs have heralded an unprecedented improvement in the performance of AI models on benchmark tests. For a long time, AI has been able to tell what’s in a picture, even as websites ask us to endlessly prove we’re not a robot by clicking on images of traffic lights or stop signs.

But now, AI is doing visual reasoning and math — seriously hard math.

Indeed, the 2024 AI Index reports that models have gone from scoring less than 10% of the relative performance of humans to more than 90% in just 2 years in competition-level math. In more simple tasks, the AI models evaluated already outperform the relevant human benchmarks.

The good news for anyone worried about losing their job is that AI researchers are increasingly concerned about running out of high-quality data to train their models, with some predicting that the available supply will be exhausted by 2026. This shortage might force developers to depend increasingly on AI-generated, or 'synthetic', data for training new models. Adobe’s solution? Pay people $3 a minute for videos of them touching things.

Related reading: How ChatGPT broke the Turing test.

Rani Molla

Search and Destroy

Ads are showing up on Google’s AI Mode now

Hope it doesn’t go the same route as Google Search!

Rani Molla

Waitlist

Tesla Robotaxi demand outpacing supply in Austin and San Francisco

The service finally became available to the public this week.

Rani Molla6h

Even OpenAI is worried about Google’s Gemini 3

When OpenAI’s ChatGPT burst onto the scene in November 2022, it sent shock waves through Silicon Valley’s biggest names. Google, Microsoft, and Amazon had all been developing generative AI, but OpenAI’s breakthrough sparked an all-out race to catch up. Until now.

It seems that OpenAI CEO Sam Altman is feeling the heat from Google, whose newly released Gemini 3 has been receiving stellar reception from AI leaderboards, analysts, and consumers alike.

“We know we have some work to do but we are catching up fast,” OpenAI CEO Sam Altman told colleagues last month, after learning about Google’s AI advances, The Information reports. “I expect the vibes out there to be rough for a bit.”

Google’s AI progress, Altman said, could “create some temporary economic headwinds for our company,” but he said OpenAI would emerge on top.

However, it’s worth remembering that, despite OpenAI’s first-mover advantage and supersized valuation, Google is a substantial adversary that is peppering its AI models across its giant existing — and highly lucrative — product suite.

Altman Memo Forecasts ‘Rough Vibes’ Due to Resurgent Google

It seems that OpenAI CEO Sam Altman is feeling the heat from Google, whose newly released Gemini 3 has been receiving stellar reception from AI leaderboards, analysts, and consumers alike.

Google’s AI progress, Altman said, could “create some temporary economic headwinds for our company,” but he said OpenAI would emerge on top.