Tech
Robot Among City Ruins
(CSA Images)

How well can top AI models do these jobs?

An OpenAI benchmark tests how well AI models can perform “economically valuable” jobs.

One of the biggest fears fueling the public’s apprehension toward AI is that the technology will eventually take their jobs.

We’ve already seen evidence that some roles like entry-level software development, customer service, and marketing are feeling the effects of automation powered by generative AI. Being able to track the real-world work capabilities of AI models will become increasingly important as models get more and more powerful.

To that end, OpenAI has created a new AI benchmark called “GDPval” that aims to measure just how well leading AI models can do realistic tasks for a variety of “economically valuable” jobs.

OpenAI describes the benchmark as an evolutionary step away from the first wave of benchmarks that followed a more academic, exam-style model:

“[GDPval] measures model performance on tasks drawn directly from the real-world knowledge work of experienced professionals across a wide range of occupations and sectors, providing a clearer picture on how models perform on economically valuable tasks. Evaluating models on realistic occupational tasks helps us understand not just how well they perform in the lab, but how they might support people in the work they do every day.”

Working with experienced industry professionals, the researchers created a dataset of 220 realistic tasks from 44 occupations that someone might do in the course of their work in a particular role.

Here’s an example of one of the tasks in the benchmark’s training data for a real estate broker:

Screenshot 2025-09-26 at 3.41.51 PM
Sample task for a real estate broker from the GDPval benchmark’s training dataset (Huggingfacce.co)

We went through the data and picked a few common jobs from the benchmark’s results. Unsurprisingly, software developers were the most impacted job, with Anthropic’s Claude model getting an average 70% win rate on the test, which was then compared to a human in that role. For example, a score of 50% would put the model on par with a human expert. Audio and video technicians should feel that their job is secure (for now), as the models executed those tasks with very low scores.

OpenAI acknowledges there are limitations with this benchmark. For instance, currently, each task comes with some background materials that are required to do the task — but generating those background materials itself requires complex work and the benchmark doesn’t assess current models’ ability to complete those necessary preparatory tasks. Instead that work is done by the humans testing the AI. The paper also notes that this is a small dataset, and the current jobs tested are mainly those of “knowledge workers” that can be performed on a computer.

Maybe a future version will be used to test how well a robot can scrub your toilet.

More Tech

See all Tech
tech

Tesla is back in the negative this year

After falling more than 6% yesterday in its biggest drop since July, Tesla is once again in negative territory for the year. Elon Musk’s company posted record earnings last month, buoyed by pulled-forward demand tied to the final quarter of US federal EV tax credits, but its margins slipped as steep discounts were used to clear inventory.

Now the stock, which only turned positive for the year in September, is under renewed pressure amid a broader tech and AI sell-off, as investors grow concerned that the Federal Reserve may pause its rate-cutting cycle. Adding to the drag are soft sales in Tesla’s second-largest market, China, and news that longtime bull Cathie Wood’s Ark Invest unloaded roughly $30 million in shares this week.

tech
Rani Molla

Meta overhauls Marketplace with AI insights and collaborative shopping

Meta announced Thursday that it’s giving its buy-and-sell platform, Marketplace — arguably the best part of Facebook and the most appealing to young people — a “glow up.” Each day in the US and Canada, one out of four Facebook daily active young adult users go to Marketplace, according to Meta. The overhaul includes the ability to create collections of listings you can share with friends or the public.

The site will also offer AI suggestions on what to ask sellers about your potential purchase. Unfortunately for all involved, the much-hated, easy-to-accidentally-press default message to sellers — “Hi, is this available” — remains unchanged.

Most promising, to us, for comedic purposes: “You can now react and comment directly on Marketplace listings, helping others learn about item quality and discover unique finds.”

The site will also offer AI suggestions on what to ask sellers about your potential purchase. Unfortunately for all involved, the much-hated, easy-to-accidentally-press default message to sellers — “Hi, is this available” — remains unchanged.

Most promising, to us, for comedic purposes: “You can now react and comment directly on Marketplace listings, helping others learn about item quality and discover unique finds.”

$15B
Rani Molla

Tesla CEO Elon Musk’s other company, xAI, has raised $15 billion in its latest funding round, CNBC reports. That’s $5 billion more than the company had raised in that same round in September. Its valuation remains at a sky-high $200 billion.

Tesla shareholders recently voted to invest in xAI but, due to a large number of abstentions, the board has yet to approve the proposal.

tech
Rani Molla

Microsoft to use OpenAI’s chips to improve its own in-house chips

As part of Microsoft’s investment in OpenAI, the company is using OpenAI’s development of custom AI semiconductors to help improve its own in-house chips, which have lagged behind peers, according to an interview with CEO Satya Nadella by podcaster Dwarkesh Patel.

“As they innovate even at the system level, we get access to all of it,” Nadella said. “We first want to instantiate what they build for them, but then we’ll extend it.” Under their updated agreement, Microsoft has access to OpenAI’s models and products — excluding the Jony Ive-designed AI device — through 2032.

Latest Stories

Sherwood Media, LLC produces fresh and unique perspectives on topical financial news and is a fully owned subsidiary of Robinhood Markets, Inc., and any views expressed here do not necessarily reflect the views of any other Robinhood affiliate, including Robinhood Markets, Inc., Robinhood Financial LLC, Robinhood Securities, LLC, Robinhood Crypto, LLC, or Robinhood Money, LLC.