Tech
Screenshot of OpenAI Operator
A screenshot of OpenAI’s “Operator” agent (OpenAI)
SMOOTH OPERATOR

OpenAI’s “Operator” is here to slowly take over your computer and mess up your life

Operator made a consequential mistake 13% of the time in early testing, such as emailing the wrong person or messing up a reminder for a person to take medication.

Jon Keegan
1/24/25 10:21AM

OpenAI released a “research preview” of its AI agent that can control your web browser. Called “Operator,” it has the ability to control your mouse and keyboard and analyze things it “sees” on your computer — very, very slowly. Currently it’s only available to ChatGPT Pro users in the US.

Operator makes use of the multistep “reasoning” you can find in ChatGPT o1, and the multimodal “vision” capabilities of ChatGPT 4o. This reasoning process achieves better (but slower) performance by breaking tasks into steps. Lots and lots of steps.

In the video demonstrations shared on the product page, you can watch Operator break the task down into dozens of distinct actions like “clicking,” “typing,” and “scrolling.” One example showed 152 steps to take a grammar quiz, and 146 steps to determine the amount of a refund from a canceled online order.

Screenshot from demo of OpenAI Operator
(OpenAI)

The potential for this kind of freewheeling AI web browsing on demand is positioned as an agent that can save you the drudgery of having to order groceries, research holidays, make restaurant reservations, or buy tickets to concerts.

Operator makes high-stakes mistakes

It’s one thing when ChatGPT spits out an incorrect answer, but if your chatbot is actually spending your money and triggering things in the real world, the stakes are much, much higher.

In its testing, OpenAI found that in one test of 100 sample tasks, 13% of the time Operator made a consequential mistake like emailing the wrong person, incorrectly bulk-removing email labels, setting the wrong date for a reminder to take the user’s medication, and ordering the wrong food item. Some of the other mistakes were easily reversible “nuisances.” OpenAI noted after mitigations, they reduced this error rate by approximately 90%.

OpenAI stresses that you have the ability to grab the wheel from the AI at any time, and you can approve any action before it is executed, but in this early evaluation version, you’ll probably have to spend more time babysitting the agent than just going ahead and doing the task on your own.

For now it limits the tasks you can use it for, prohibiting banking or job applications.

OpenAI shared a list of example tasks that some hypothetical user might want an AI to do for them. Ten out of ten times Operator was able to research bear habitats, create a grocery list, and make a ’90s playlist on Spotify.

Medium persuasion

The system card for the model behind Operator — Computer-Using Agent (CUA) — describes the process OpenAI used to assess the risks of letting a prerelease, novel AI agent go hog wild with your computer.

Like other model releases, OpenAI tested the model by using red teams with expertise in social engineering, CBRN (chemical, biological, radiological, and nuclear) threats, and cybersecurity. OpenAI gave itself a “low” risk for everything except “persuasion,” which got a “medium” risk score and is considered safe enough for public release.

High consequence

But there are some important restrictions on how you can use Operator. Because there is a slightly elevated risk of using Operator for influencing people, the usage policy prohibits impersonating people or organizations, concealing the role of AI in tasks, or using it to spread disinformation or false interactions, like fake reviews or fake profiles.

OpenAI prohibits people from using Operator to commit any crimes, but you are also prohibited from using it to bully, harass, defame, or discriminate against others based on protected attributes.

Under a heading titled “high consequence domains,” it notes that you can’t use Operator to make “high-stakes decisions” that might affect your safety or well-being, automate stock trading, or use it for political campaigning or lobbying.

OpenAI’s announcement follows competitor Anthropic’s October release of a similar feature that can control your computer. There is widespread hype that “agentic AI” like Operator will be a breakthrough for how people use these tools.

OpenAI CEO Sam Altman said in an announcement video that Operator is expected to roll out to international ChatGPT Pro and ChatGPT Plus users “soon,” but noted that the European rollout “will unfortunately take a while.”

More Tech

See all Tech
tech

Trump administration plans to loosen rules for self-driving cars, exempt them from windshield wipers

The National Highway Traffic Safety Administration (NHTSA) said Thursday it’s planning to propose three new rules that will make it easier for self-driving car companies to develop their vehicles more cheaply. Those include getting rid of requirements that were mandatory for human drivers, including gear shift sticks, windshield defrosting and defogging systems, and some lighting equipment.

“Federal Motor Vehicle Safety Standards were written for vehicles with human drivers and need to be updated for autonomous vehicles. Removing these requirements will reduce costs and enhance safety,” NHTSA Chief Counsel Peter Simshauser said in a statement.

Earlier this year NHTSA announced it was loosening other rules around autonomous cars, including exempting them from certain federal safety rules for research and demonstration purposes. This time around, however, stocks like Tesla, which is banking on autonomous driving as part of the future of the company, aren’t moving as much on the news.

“Federal Motor Vehicle Safety Standards were written for vehicles with human drivers and need to be updated for autonomous vehicles. Removing these requirements will reduce costs and enhance safety,” NHTSA Chief Counsel Peter Simshauser said in a statement.

Earlier this year NHTSA announced it was loosening other rules around autonomous cars, including exempting them from certain federal safety rules for research and demonstration purposes. This time around, however, stocks like Tesla, which is banking on autonomous driving as part of the future of the company, aren’t moving as much on the news.

10,000

Meta’s Threads app is adding a way for users to post up to 10,000 characters, using a new feature called “text attachments”.

Currently Threads posts can contain 500 characters, and many times people just post screenshots of longer text. The company said they noticed users posting screenshots of text from books, articles and podcast transcripts.

Threads competitor X allows users to post up to 25,000 characters, but the feature is only available to paid subscribers. Recently, Meta CEO Mark Zuckerberg said the platform had passed 400 million monthly active users.

tech

Tesla’s new Robotaxi app is already near the top of Apple’s App Store

Tesla launched its Robotaxi app last night and already it’s the No. 6 most downloaded app in Apple’s free App Store. It’s also currently the top travel app, ahead of the perennially popular Uber and Lyft.

But as we’ve written, the app won’t necessarily allow you to take a ride in one of Tesla’s roughly 30 autonomous cars in Austin — or even in its more Uber-like ride-hailing service in the Bay Area. For now it just allows users to join a waitlist for the two services. (I’ll let you know when I’m in.)

Robotaxi no. 6 App Store
Apple

Tesla and xAI CEO Elon Musk is currently suing Apple, alleging the iPhone maker has kept xAI’s Grok app from ascending the App Store. Grok is currently ranked 73rd.

But as we’ve written, the app won’t necessarily allow you to take a ride in one of Tesla’s roughly 30 autonomous cars in Austin — or even in its more Uber-like ride-hailing service in the Bay Area. For now it just allows users to join a waitlist for the two services. (I’ll let you know when I’m in.)

Robotaxi no. 6 App Store
Apple

Tesla and xAI CEO Elon Musk is currently suing Apple, alleging the iPhone maker has kept xAI’s Grok app from ascending the App Store. Grok is currently ranked 73rd.

tech

Amazon is reportedly testing an enterprise agentic AI tool called “Quick Suite”

Shares of Amazon are rallying as the tech titan readies itself to take another stab at the enterprise software market with a new “agentic AI” tool, according to a report from Business Insider.

“Quick Suite” is a tool that Amazon documents say helps “every business user to make better decisions, faster, and act on them swiftly by unifying Al agents for business insights, deep research, and automation into a single experience,” Business Insider reports.

Amazon is reportedly testing the software with 50 companies, including several large clients like BMW and Koch Industries, the report says. The release of the AWS-powered Quick Suite was delayed from a planned July release to September, per the report.

The timing of this report looks just about perfect for Amazon: recently, the AI trade has shifted more toward software beneficiaries than the picks-and-shovels hardware providers.

Amazon is reportedly testing the software with 50 companies, including several large clients like BMW and Koch Industries, the report says. The release of the AWS-powered Quick Suite was delayed from a planned July release to September, per the report.

The timing of this report looks just about perfect for Amazon: recently, the AI trade has shifted more toward software beneficiaries than the picks-and-shovels hardware providers.

Latest Stories

Sherwood Media, LLC produces fresh and unique perspectives on topical financial news and is a fully owned subsidiary of Robinhood Markets, Inc., and any views expressed here do not necessarily reflect the views of any other Robinhood affiliate, including Robinhood Markets, Inc., Robinhood Financial LLC, Robinhood Securities, LLC, Robinhood Crypto, LLC, or Robinhood Money, LLC.