Tech
Robot Among City Ruins
(CSA Images)

How well can top AI models do these jobs?

An OpenAI benchmark tests how well AI models can perform “economically valuable” jobs.

One of the biggest fears fueling the public’s apprehension toward AI is that the technology will eventually take their jobs.

We’ve already seen evidence that some roles like entry-level software development, customer service, and marketing are feeling the effects of automation powered by generative AI. Being able to track the real-world work capabilities of AI models will become increasingly important as models get more and more powerful.

To that end, OpenAI has created a new AI benchmark called “GDPval” that aims to measure just how well leading AI models can do realistic tasks for a variety of “economically valuable” jobs.

OpenAI describes the benchmark as an evolutionary step away from the first wave of benchmarks that followed a more academic, exam-style model:

“[GDPval] measures model performance on tasks drawn directly from the real-world knowledge work of experienced professionals across a wide range of occupations and sectors, providing a clearer picture on how models perform on economically valuable tasks. Evaluating models on realistic occupational tasks helps us understand not just how well they perform in the lab, but how they might support people in the work they do every day.”

Working with experienced industry professionals, the researchers created a dataset of 220 realistic tasks from 44 occupations that someone might do in the course of their work in a particular role.

Here’s an example of one of the tasks in the benchmark’s training data for a real estate broker:

Screenshot 2025-09-26 at 3.41.51 PM
Sample task for a real estate broker from the GDPval benchmark’s training dataset (Huggingfacce.co)

We went through the data and picked a few common jobs from the benchmark’s results. Unsurprisingly, software developers were the most impacted job, with Anthropic’s Claude model getting an average 70% win rate on the test, which was then compared to a human in that role. For example, a score of 50% would put the model on par with a human expert. Audio and video technicians should feel that their job is secure (for now), as the models executed those tasks with very low scores.

OpenAI acknowledges there are limitations with this benchmark. For instance, currently, each task comes with some background materials that are required to do the task — but generating those background materials itself requires complex work and the benchmark doesn’t assess current models’ ability to complete those necessary preparatory tasks. Instead that work is done by the humans testing the AI. The paper also notes that this is a small dataset, and the current jobs tested are mainly those of “knowledge workers” that can be performed on a computer.

Maybe a future version will be used to test how well a robot can scrub your toilet.

More Tech

See all Tech
tech

Tesla’s 45 Austin Robotaxis now have 14 crashes on the books since launching in June

Since launching in June 2025, Tesla’s 45 Austin Robotaxis have been involved in 14 crashes, per Electrek reporting citing National Highway Traffic Safety Administration data.

Electrek analysis found that the vehicles have traveled roughly 800,000 paid miles in that time period, amounting to a crash every 57,000 miles. According to the NHTSA, US drivers crash once every 500,000 miles on average.

The article says Tesla submitted five new crash reports in January of this year that happened in December and January. Electrek wrote:

“The new crashes include a collision with a fixed object at 17 mph while the vehicle was driving straight, a crash with a bus while the Tesla was stationary, a collision with a heavy truck at 4 mph, and two separate incidents where the Tesla backed into objects, one into a pole or tree at 1 mph and another into a fixed object at 2 mph.”

Tesla updated a previously reported crash that was originally filed as only having damaged property to include a passenger’s hospitalization.

Last month, Tesla shares climbed after CEO Elon Musk said in a post on X that the company’s Austin Robotaxis had begun operating without a safety monitor.

The article says Tesla submitted five new crash reports in January of this year that happened in December and January. Electrek wrote:

“The new crashes include a collision with a fixed object at 17 mph while the vehicle was driving straight, a crash with a bus while the Tesla was stationary, a collision with a heavy truck at 4 mph, and two separate incidents where the Tesla backed into objects, one into a pole or tree at 1 mph and another into a fixed object at 2 mph.”

Tesla updated a previously reported crash that was originally filed as only having damaged property to include a passenger’s hospitalization.

Last month, Tesla shares climbed after CEO Elon Musk said in a post on X that the company’s Austin Robotaxis had begun operating without a safety monitor.

tech
Jon Keegan

Ahead of IPO, Anthropic adds veteran executive and former Trump administration official to board

Anthropic is moving to put the pieces in place for a successful IPO this year.

Today, the company announced that Chris Liddel would join its board of directors.

Liddel is an seasoned executive who previously served as CFO for Microsoft, GM, and International Paper.

Liddel also comes with experience in government, having served as the deputy White House chief of staff during the first Trump administration.

Ties to the Trump world could be helpful for Anthropic as it pushes to enter the public market. Its reportedly not on the greatest terms with the current administration, as the startup has pushed back on using its Claude AI for surveillance applications.

Liddel is an seasoned executive who previously served as CFO for Microsoft, GM, and International Paper.

Liddel also comes with experience in government, having served as the deputy White House chief of staff during the first Trump administration.

Ties to the Trump world could be helpful for Anthropic as it pushes to enter the public market. Its reportedly not on the greatest terms with the current administration, as the startup has pushed back on using its Claude AI for surveillance applications.

tech
Rani Molla

Meta is bringing back facial recognition for its smart glasses

Meta is reviving its highly controversial facial recognition efforts, with plans to incorporate the tech into its smart glasses as soon as this year, The New York Times reports.

In 2021, around the time Facebook rebranded as Meta, the company shut down the facial recognition software it had used to tag people in photos, saying it needed to “find the right balance.”

Now, according to an internal memo reviewed by the Times, Meta seems to feel that it’s at least found the right moment, noting that the fraught and crowded political climate could allow the feature to attract less scrutiny.

“We will launch during a dynamic political environment where many civil society groups that we would expect to attack us would have their resources focused on other concerns,” the document reads.

The tech, called “Name Tag” internally, would let smart glass wearers identify and surface information about people they see with the glasses by using Meta’s artificial intelligence assistant.

Now, according to an internal memo reviewed by the Times, Meta seems to feel that it’s at least found the right moment, noting that the fraught and crowded political climate could allow the feature to attract less scrutiny.

“We will launch during a dynamic political environment where many civil society groups that we would expect to attack us would have their resources focused on other concerns,” the document reads.

The tech, called “Name Tag” internally, would let smart glass wearers identify and surface information about people they see with the glasses by using Meta’s artificial intelligence assistant.

Latest Stories

Sherwood Media, LLC produces fresh and unique perspectives on topical financial news and is a fully owned subsidiary of Robinhood Markets, Inc., and any views expressed here do not necessarily reflect the views of any other Robinhood affiliate, including Robinhood Markets, Inc., Robinhood Financial LLC, Robinhood Securities, LLC, Robinhood Crypto, LLC, or Robinhood Money, LLC.