Tech
Robot Among City Ruins
(CSA Images)

How well can top AI models do these jobs?

An OpenAI benchmark tests how well AI models can perform “economically valuable” jobs.

One of the biggest fears fueling the public’s apprehension toward AI is that the technology will eventually take their jobs.

We’ve already seen evidence that some roles like entry-level software development, customer service, and marketing are feeling the effects of automation powered by generative AI. Being able to track the real-world work capabilities of AI models will become increasingly important as models get more and more powerful.

To that end, OpenAI has created a new AI benchmark called “GDPval” that aims to measure just how well leading AI models can do realistic tasks for a variety of “economically valuable” jobs.

OpenAI describes the benchmark as an evolutionary step away from the first wave of benchmarks that followed a more academic, exam-style model:

“[GDPval] measures model performance on tasks drawn directly from the real-world knowledge work of experienced professionals across a wide range of occupations and sectors, providing a clearer picture on how models perform on economically valuable tasks. Evaluating models on realistic occupational tasks helps us understand not just how well they perform in the lab, but how they might support people in the work they do every day.”

Working with experienced industry professionals, the researchers created a dataset of 220 realistic tasks from 44 occupations that someone might do in the course of their work in a particular role.

Here’s an example of one of the tasks in the benchmark’s training data for a real estate broker:

Screenshot 2025-09-26 at 3.41.51 PM
Sample task for a real estate broker from the GDPval benchmark’s training dataset (Huggingfacce.co)

We went through the data and picked a few common jobs from the benchmark’s results. Unsurprisingly, software developers were the most impacted job, with Anthropic’s Claude model getting an average 70% win rate on the test, which was then compared to a human in that role. For example, a score of 50% would put the model on par with a human expert. Audio and video technicians should feel that their job is secure (for now), as the models executed those tasks with very low scores.

OpenAI acknowledges there are limitations with this benchmark. For instance, currently, each task comes with some background materials that are required to do the task — but generating those background materials itself requires complex work and the benchmark doesn’t assess current models’ ability to complete those necessary preparatory tasks. Instead that work is done by the humans testing the AI. The paper also notes that this is a small dataset, and the current jobs tested are mainly those of “knowledge workers” that can be performed on a computer.

Maybe a future version will be used to test how well a robot can scrub your toilet.

More Tech

See all Tech
tech

Cybertruck battery material supplier writes down Tesla deal by 99%

South Korea’s L&F Co., a supplier of battery material for Tesla’s “apocalypse-proof” Cybertruck, has written down the value of its Tesla contract by more than 99%, Bloomberg reports — another sign that Cybertruck sales are faltering.

The company cited changes in supply quantities, slashing a contract valued at nearly $3 billion in 2023 to about $7,000 now.

tech

Estimates for Tesla’s Q4 deliveries are declining

Analysts across the board are expecting Tesla’s fourth-quarter deliveries to decline from last year, as record deliveries fueled by the end of the EV tax credit come to grips with the actual end of the EV tax credit. And as the end of the quarter nears, estimates have sunk further.

Currently the FactSet consensus estimate expects Tesla to deliver 449,000 vehicles in Q4, down 9.5% from last year’s 496,000 and down from 450,000 earlier this month. Bloomberg now pegs the number at 445,000, down from a 448,000 consensus estimate at the start of December.

Prediction markets are even less bullish. The market-implied odds derived through event contracts show that less than a quarter of traders believe Tesla will surpass 430,000 deliveries in the quarter ending December. The actual delivery numbers are expected to be released in early January.

(Event contracts are offered through Robinhood Derivatives, LLC — probabilities referenced or sourced from KalshiEx LLC or ForecastEx LLC.)

Currently the FactSet consensus estimate expects Tesla to deliver 449,000 vehicles in Q4, down 9.5% from last year’s 496,000 and down from 450,000 earlier this month. Bloomberg now pegs the number at 445,000, down from a 448,000 consensus estimate at the start of December.

Prediction markets are even less bullish. The market-implied odds derived through event contracts show that less than a quarter of traders believe Tesla will surpass 430,000 deliveries in the quarter ending December. The actual delivery numbers are expected to be released in early January.

(Event contracts are offered through Robinhood Derivatives, LLC — probabilities referenced or sourced from KalshiEx LLC or ForecastEx LLC.)

tech
Jon Keegan

Chinese AI chatbots reportedly must answer 2,000 questions, prove censorship compliance

For American companies building AI today, its basically a free-for-all, a self-regulation zone with zero federal restrictions.

But for Chinese AI companies, the Chinese Communist Party exerts strict control over what models get released and what questions they cannot answer.

A report in The Wall Street Journal details the rigorous tests that AI models are subjected to before being released on the global stage to compete with Western AI models.

AI models must answer 2,000 questions that are frequently updated and achieve a 95% refusal rate for queries related to forbidden topics, like the Tiananmen Square massacre or human rights violations, according to the report.

The strict regulatory framework does have some safety advantages, such as preventing chatbots from sharing violent or pornographic material as well as protections from self-harm, an issue that American AI companies are currently wrestling with.

A report in The Wall Street Journal details the rigorous tests that AI models are subjected to before being released on the global stage to compete with Western AI models.

AI models must answer 2,000 questions that are frequently updated and achieve a 95% refusal rate for queries related to forbidden topics, like the Tiananmen Square massacre or human rights violations, according to the report.

The strict regulatory framework does have some safety advantages, such as preventing chatbots from sharing violent or pornographic material as well as protections from self-harm, an issue that American AI companies are currently wrestling with.

tech

Report: OpenAI has started mocking up what ads in ChatGPT could look like

2025 saw OpenAI ink a flurry of massive deals. To pay for it all, the company has realized that it can’t get there on $20-per-month subscriptions alone; it also needs to monetize its hundreds of millions of free users.

To this end, despite repeatedly denying that ads are coming to ChatGPT, a new report says OpenAI is actually working through all those details.

Citing people familiar with the discussions, The Information reports employees have discussed different ways to prioritize sponsored information in ChatGPT in response to relevant queries.

Since ChatGPT burst onto the scene in late 2022, its offerings have been ad-free, relying instead on a freemium subscription model. But with Google recently telling advertisers it plans to bring ads to Gemini next year, and with OpenAI burning through truckloads of cash, the pressure to follow suit is growing.

OpenAI is looking at its AI model-developing competitors Meta and Google, which are pulling in hundreds of billions of dollars per year in advertising revenue, to arrive at this conclusion. It’s also seemingly inspired by Amazon’s (and Google’s) idea of sponsored product placement.

Per the report, in addition to trying to build new kinds of ad units, OpenAI is considering a few options:

  • Leaning into chats that are clearly about buying a product and giving priority placement to sponsored results — though this works out to only about 2.1% of queries, according to OpenAI.

  • Showing ads based on the treasure trove of information it has on users, by mining their chat histories.

  • A “sponsored” sidebar showing ads related to the conversation.

But the company realizes it has to be careful to not turn off users, who might not trust a chatbot that peppers sensitive conversations with ads.

Citing people familiar with the discussions, The Information reports employees have discussed different ways to prioritize sponsored information in ChatGPT in response to relevant queries.

Since ChatGPT burst onto the scene in late 2022, its offerings have been ad-free, relying instead on a freemium subscription model. But with Google recently telling advertisers it plans to bring ads to Gemini next year, and with OpenAI burning through truckloads of cash, the pressure to follow suit is growing.

OpenAI is looking at its AI model-developing competitors Meta and Google, which are pulling in hundreds of billions of dollars per year in advertising revenue, to arrive at this conclusion. It’s also seemingly inspired by Amazon’s (and Google’s) idea of sponsored product placement.

Per the report, in addition to trying to build new kinds of ad units, OpenAI is considering a few options:

  • Leaning into chats that are clearly about buying a product and giving priority placement to sponsored results — though this works out to only about 2.1% of queries, according to OpenAI.

  • Showing ads based on the treasure trove of information it has on users, by mining their chat histories.

  • A “sponsored” sidebar showing ads related to the conversation.

But the company realizes it has to be careful to not turn off users, who might not trust a chatbot that peppers sensitive conversations with ads.

Latest Stories

Sherwood Media, LLC produces fresh and unique perspectives on topical financial news and is a fully owned subsidiary of Robinhood Markets, Inc., and any views expressed here do not necessarily reflect the views of any other Robinhood affiliate, including Robinhood Markets, Inc., Robinhood Financial LLC, Robinhood Securities, LLC, Robinhood Crypto, LLC, or Robinhood Money, LLC.