Tech
Screenshot of OpenAI Operator
A screenshot of OpenAI’s “Operator” agent (OpenAI)
SMOOTH OPERATOR

OpenAI’s “Operator” is here to slowly take over your computer and mess up your life

Operator made a consequential mistake 13% of the time in early testing, such as emailing the wrong person or messing up a reminder for a person to take medication.

Jon Keegan

OpenAI released a “research preview” of its AI agent that can control your web browser. Called “Operator,” it has the ability to control your mouse and keyboard and analyze things it “sees” on your computer — very, very slowly. Currently it’s only available to ChatGPT Pro users in the US.

Operator makes use of the multistep “reasoning” you can find in ChatGPT o1, and the multimodal “vision” capabilities of ChatGPT 4o. This reasoning process achieves better (but slower) performance by breaking tasks into steps. Lots and lots of steps.

In the video demonstrations shared on the product page, you can watch Operator break the task down into dozens of distinct actions like “clicking,” “typing,” and “scrolling.” One example showed 152 steps to take a grammar quiz, and 146 steps to determine the amount of a refund from a canceled online order.

Screenshot from demo of OpenAI Operator
(OpenAI)

The potential for this kind of freewheeling AI web browsing on demand is positioned as an agent that can save you the drudgery of having to order groceries, research holidays, make restaurant reservations, or buy tickets to concerts.

Operator makes high-stakes mistakes

It’s one thing when ChatGPT spits out an incorrect answer, but if your chatbot is actually spending your money and triggering things in the real world, the stakes are much, much higher.

In its testing, OpenAI found that in one test of 100 sample tasks, 13% of the time Operator made a consequential mistake like emailing the wrong person, incorrectly bulk-removing email labels, setting the wrong date for a reminder to take the user’s medication, and ordering the wrong food item. Some of the other mistakes were easily reversible “nuisances.” OpenAI noted after mitigations, they reduced this error rate by approximately 90%.

OpenAI stresses that you have the ability to grab the wheel from the AI at any time, and you can approve any action before it is executed, but in this early evaluation version, you’ll probably have to spend more time babysitting the agent than just going ahead and doing the task on your own.

For now it limits the tasks you can use it for, prohibiting banking or job applications.

OpenAI shared a list of example tasks that some hypothetical user might want an AI to do for them. Ten out of ten times Operator was able to research bear habitats, create a grocery list, and make a ’90s playlist on Spotify.

Medium persuasion

The system card for the model behind Operator — Computer-Using Agent (CUA) — describes the process OpenAI used to assess the risks of letting a prerelease, novel AI agent go hog wild with your computer.

Like other model releases, OpenAI tested the model by using red teams with expertise in social engineering, CBRN (chemical, biological, radiological, and nuclear) threats, and cybersecurity. OpenAI gave itself a “low” risk for everything except “persuasion,” which got a “medium” risk score and is considered safe enough for public release.

High consequence

But there are some important restrictions on how you can use Operator. Because there is a slightly elevated risk of using Operator for influencing people, the usage policy prohibits impersonating people or organizations, concealing the role of AI in tasks, or using it to spread disinformation or false interactions, like fake reviews or fake profiles.

OpenAI prohibits people from using Operator to commit any crimes, but you are also prohibited from using it to bully, harass, defame, or discriminate against others based on protected attributes.

Under a heading titled “high consequence domains,” it notes that you can’t use Operator to make “high-stakes decisions” that might affect your safety or well-being, automate stock trading, or use it for political campaigning or lobbying.

OpenAI’s announcement follows competitor Anthropic’s October release of a similar feature that can control your computer. There is widespread hype that “agentic AI” like Operator will be a breakthrough for how people use these tools.

OpenAI CEO Sam Altman said in an announcement video that Operator is expected to roll out to international ChatGPT Pro and ChatGPT Plus users “soon,” but noted that the European rollout “will unfortunately take a while.”

More Tech

See all Tech
tech

“Tesla killer” Slate Auto switches CEOs ahead of launch later this year

Just months before the expected launch of its $25,000 truck, so-called Tesla killer Slate Auto has swapped out its CEO. Former Amazon Marketplace Vice President Peter Faricy is the new leader of the Jeff Bezos-backed company, while the previous CEO, Chris Barman, one of the electric truck maker’s first employees, is now president of vehicles.

“ The marketplace component is really important to us. Being able to understand how to sell things in the 21st century is really important because we're gonna be direct to consumer, without dealerships,” Jeff Jablansky, head of communications at Slate, said of the change.  “The way Chris put it is, we are adding horsepower at a critical moment when people are going to be able to actually order their trucks.”

In a social media post just last month, then CEO Barman said the company would unveil the exact price tag for its Blank Slate, which goes on sale late in 2026, in June, but reaffirmed it will be in the mid-$20,000s.

“ The marketplace component is really important to us. Being able to understand how to sell things in the 21st century is really important because we're gonna be direct to consumer, without dealerships,” Jeff Jablansky, head of communications at Slate, said of the change.  “The way Chris put it is, we are adding horsepower at a critical moment when people are going to be able to actually order their trucks.”

In a social media post just last month, then CEO Barman said the company would unveil the exact price tag for its Blank Slate, which goes on sale late in 2026, in June, but reaffirmed it will be in the mid-$20,000s.

tech

Amazon’s autonomous ride-hailing service now testing in 10 markets

Amazon self-driving subsidiary Zoox announced Monday that it’s testing in two additional markets, Phoenix and Dallas, bringing its total to 10 US markets. The company will begin by mapping select neighborhoods using retrofitted Toyota Highlander SUVs with safety drivers behind the wheel, before progressing to autonomous testing and eventually rolling out its steering-wheel-less, purpose-built vehicles for public users.

The service is currently available to the public in Las Vegas and to select users in the Bay Area, where it’s served 300,000 riders.

Zoox is also opening a third “Fusion Center” facility, in Arizona after Las Vegas and the Bay Area, from which it will provide assistance and coordinate operations for its fleet.

Zoox’s expansion comes as Alphabet’s Waymo recently reached its 10th public market and as Tesla’s Robotaxi says it plans to open in six new markets in the first half of the year.

The service is currently available to the public in Las Vegas and to select users in the Bay Area, where it’s served 300,000 riders.

Zoox is also opening a third “Fusion Center” facility, in Arizona after Las Vegas and the Bay Area, from which it will provide assistance and coordinate operations for its fleet.

Zoox’s expansion comes as Alphabet’s Waymo recently reached its 10th public market and as Tesla’s Robotaxi says it plans to open in six new markets in the first half of the year.

tech

Microsoft will use Anthropic’s Claude to power “Copilot Cowork”

Microsoft is partnering with Anthropic to power its new agentic offering, Copilot Cowork. The AI world is abuzz with agents that can do your busywork for you, and Anthropic’s Claude Cowork is one of the most prominent and capable offerings in the field.

The tech giant wrote:

“Working closely with Anthropic, we have integrated the technology behind Claude Cowork into Microsoft 365 Copilot. It is this multi-model advantage that makes Copilot different. Your work is not limited by one brand of models.”

Microsoft listed some examples of how Copilot Cowork could help with common tasks such as rescheduling meetings, sending emails, researching companies, working with spreadsheets, and making presentations.

It’s worth stepping back to note how wild it is that Microsoft, the productivity software behemoth that has absolutely dominated the business world for decades, has had to turn to an AI startup to control those apps.

“Working closely with Anthropic, we have integrated the technology behind Claude Cowork into Microsoft 365 Copilot. It is this multi-model advantage that makes Copilot different. Your work is not limited by one brand of models.”

Microsoft listed some examples of how Copilot Cowork could help with common tasks such as rescheduling meetings, sending emails, researching companies, working with spreadsheets, and making presentations.

It’s worth stepping back to note how wild it is that Microsoft, the productivity software behemoth that has absolutely dominated the business world for decades, has had to turn to an AI startup to control those apps.

tech

China’s smartphone slump could strengthen Apple

China smartphone shipments fell 22% year over year in January, according to a new Bernstein research note. The drop was partly due to the timing of Lunar New Year and tough comparisons with last year, when government subsidies boosted sales, but rising memory costs are also weighing on demand — especially in the lower-end segment dominated by Chinese brands.

Low-tier shipments fell 37%, hitting brands like Honor and Vivo particularly hard, while high-end sales from Apple and Huawei held up better. Overall average selling prices rose 13%.

That could be good news for Apple, which sits at the more price-insulated upper end of the Chinese market and has been making a comeback in the country. Apple’s market share grew to 18% in January — in line with Huawei — from 14% a year earlier, while the rest of the market fell 2 percentage points to 65%.

With its scale and industry-leading margins, the iPhone maker is better positioned to absorb higher memory costs. To wit: it recently unveiled the $599 iPhone 17e, which keeps its entry price steady with its predecessor while doubling storage.

Latest Stories

Sherwood Media, LLC produces fresh and unique perspectives on topical financial news and is a fully owned subsidiary of Robinhood Markets, Inc., and any views expressed here do not necessarily reflect the views of any other Robinhood affiliate, including Robinhood Markets, Inc., Robinhood Financial LLC, Robinhood Securities, LLC, Robinhood Crypto, LLC, or Robinhood Money, LLC.