Tech
Sam Altman - 11th Breakthrough Prize Ceremony - Arrivals
(Taylor Hill/FilmMagic)
AGENT FOR HIRE

OpenAI’s new “agent” is here to help you buy tuxedos, book luxury hotels, and order batches of cupcakes excruciatingly slowly

The new tool is trained to help execute real-world web-based tasks, but comes with significant risks and caveats.

Jon Keegan
7/18/25 9:29AM

“Agentic AI” is the latest buzzword in tech, as the industry furiously races to monetize the costly technology. Yesterday OpenAI announced its second offering in the emerging category: “ChatGPT Agent,” which comes on the heels of January’s browser-controlling “Operator” tool.

The promise of a superpowered AI agent is that with a simple prompt, your agent can toil away behind the scenes and take care of the busywork you can’t bother yourself with. That could be boring tasks in your day-to-day life, chores, or the things you actually get paid to do for your job.

In a post on X, OpenAI cofounder and CEO Sam Altman said of the new technology:

In the livestream announcing Agent, the OpenAI team demonstrated the tool tackling some of these boring tasks that Agent could spare you from.

For example, youre probably dreading all of the complex planning and research required to attend your dear friends’ wedding in Hawaii. What outfit should I buy? And what gift? Agent has you covered.

Perhaps Agent was trained to solve problems for people with AI researcher pay packages, but after thinking for about 18 minutes (with a break to ask the user a question), Agent came to the rescue to help find a $1,500 Brooks Brothers tuxedo and a nice hotel to stay at in Maui for $4,600 (five nights).

In a demo for Wired, an OpenAI researcher showed how Agent could order a batch of cupcakes — which took it one hour to complete.

The fact that a newly released, bleeding-edge AI tool like Agent takes so long to execute a query isn’t just about wasting your valuable time.

The kind of “reasoning” that Agent is undertaking while executing these queries is among the most computationally expensive of the services that OpenAI offers.

OpenAI describes the advanced “deep research” queries it offers as “very compute intensive,” which the company says is a part of Agent’s capabilities. OpenAI is currently losing money on its $200 per month all-you-can-eat plan for intensive queries.

That could also be a significant amount of water and energy consumed for what would normally be very lightweight tasks when performed by a human.

The company said Agent would be rolling out to ChatGPT Plus, Pro, and Team users yesterday. Pro and Team subscribers get 400 Agent queries per month, and Plus users get 40 per month. (It wasn’t available for my account to try out before I wrote this.)

Striving to Excel

In addition to the ability to sift through your emails and calendars, OpenAI is playing up Agent’s ability to create and edit documents like slide decks and spreadsheets. The announcement highlighted Agent’s superior accuracy when editing spreadsheets “derived from real-world scenarios” compared to Microsoft Copilot using Excel: “ChatGPT agent outperforms existing models by a significant margin.”

Agent (with .xlsx access) scored a 45.5% accuracy rating, while Copilot in Excel scored only 20%. But ChatGPT Agent’s score is actually not great when you consider that humans scored a 71.3%.

“New risk surface”

OpenAI has always offered frank assessments of the risks of its new tools, and Agent is no exception. Choosing to take a “precautionary approach,” OpenAI is treating Agent as having “High Biological and Chemical capabilities” according to its “Preparedness Framework.” That document describes this new “high” capability as:

“The model can provide meaningful counterfactual assistance (relative to unlimited access to baseline of tools available in 2021) to novice actors (anyone with a basic relevant technical background) that enables them to create known biological or chemical threats.”

The “associated risk” of this threshold:

“Significantly increased likelihood and frequency of biological or chemical terror events by non-state actors using known reference-class threats.”

In an email to Sherwood News, OpenAI spokesperson Niko Felix explained that Agent mode is not the default model and users are free to choose to use it at any time. Felix said Agent is trained to explicitly ask users for permissions for tasks with real-world consequences, such as online purchases. And for now, Agent is trained to refuse high-risk tasks like banking.

Felix also cited a caveat in the announcement:

“While ChatGPT agent is already a powerful tool for handling complex tasks, today’s launch is just the beginning. We’ll continue to iteratively add significant improvements regularly, making it more capable and useful to more people over time.”

One of the concerns that OpenAI red teams had was the risk of novel “prompt injections” for Agent that could trick users into sharing personal data or taking actions that they shouldn’t.

In a post on X, Altman said he would caution his own family from using Agent for “high-stakes uses” until the company has had a chance to study it in the wild.

“But we do want people to treat this as a new technology and a new risk surface,” Altman said in the livestream video. “But that said, we hope you’ll love it.”

More Tech

See all Tech
tech

Report: Microsoft adds Anthropic alongside OpenAI in Office 365, citing better performance

In a move that could test its fraught $13 billion partnership, Microsoft is moving away from relying solely on OpenAI to power its AI features in Office 365 and will now also include Anthropic’s Claude Sonnet 4 model, according to a report from The Information.

The move is a tectonic shift that boosts Anthropic’s standing, heightens risks for OpenAI, and has huge ramifications for the balance of power in the fast-moving AI field.

Per the report, Microsoft executives found that Anthropic’s AI outperformed OpenAI’s on tasks involving spreadsheets and generating PowerPoint slide decks, both crucial parts of Microsoft’s Office 365 productivity suite.

Microsoft will have to pay the competition to provide the services —Amazon Web Services currently hosts Anthropic’s models while Microsoft’s Azure cloud service does not, The Information reported.

OpenAI is also reportedly working on its own productivity suite of apps.

The move is a tectonic shift that boosts Anthropic’s standing, heightens risks for OpenAI, and has huge ramifications for the balance of power in the fast-moving AI field.

Per the report, Microsoft executives found that Anthropic’s AI outperformed OpenAI’s on tasks involving spreadsheets and generating PowerPoint slide decks, both crucial parts of Microsoft’s Office 365 productivity suite.

Microsoft will have to pay the competition to provide the services —Amazon Web Services currently hosts Anthropic’s models while Microsoft’s Azure cloud service does not, The Information reported.

OpenAI is also reportedly working on its own productivity suite of apps.

tech

Apple announces extra slim iPhone Air, iPhone Pro with longer battery life, updated AirPods Pro 3 with live language translation, and refreshed Apple Watch line

At todays Awe Dropping Apple event, the company announced its yearly refresh of the iPhone lineup. The new iPhone 17, iPhone 17 Pro, and iPhone 17 Pro Max were joined by a brand-new addition: the iPhone Air, a superthin model with tougher glass and faster processors.

Apple shares dipped on news of the product releases and are down about 1.4% on the day in afternoon trading.

The company also announced an updated Apple Watch line — Series 11, SE3, and Ultra 3 — with new features like 5G, high blood pressure detection, 24-hour battery life, and satellite communication. 

Apple iPhone 17
Apple’s iPhone 17 (Photo: Apple)

Here’s a breakdown of the new products Apple announced:

  • The ultrathin iPhone Air was described by Apple as “a paradox you have to hold to believe.” The sleek 5.6-millimeter-thin iPhone features a crack- and scratch-resistant front and back and “Macbook Pro levels of compute,” which you can pair with a weird $59 cross-body strap. It starts at $999.

  • The iPhone 17 has a faster A19 chip, an improved smart selfie camera, and a higher-resolution screen. It starts at $799.

  • The iPhone 17 Pro has a new design, ever-faster A19 Pro chip, a tougher ceramic shield on the front and back, better cameras, and a bigger battery that gets an extra 10 hours of video playback compared to its predecessor. It costs $100 more than the previous generation, but the minimum storage has doubled to 256 gigabytes. It starts at $1,099.

  • The iPhone 17 Pro Max starts at $1,199.

  • The AirPods Pro 3 have AI-powered live translation, a new heart rate sensor, eight hours of battery life, and improved active noise cancellation. The new AirPods can also track workouts, and Apple says they are built to fit more people’s ears with a new design and foam ear tips. They start at $249.

  • The Apple Watch Series 11 has 5G, a new high blood pressure detection feature, improved sleep tracking, a more scratch-resistant face, and 24 hours of battery life.

  • The entry-level Apple Watch SE 3 gets 5G, new health-tracking features, and an always-on display. It starts at $249.

  • The chunky Apple Watch Ultra 3 has an impressive 42-hour battery life, satellite communications for emergencies, and a brighter and bigger display. It starts at $799.

tech

Nebius soars after signing a 5-year deal with Microsoft to supply nearly $20 billion worth of AI computing power

Artificial intelligence infrastructure group Nebius jumped more than 50% in early trading on Tuesday after the company announced after the close on Monday a major deal to supply computing power for Microsoft’s AI operations.

Under the agreement, Nebius — which rose from the ashes of Russian tech giant Yandex — will provide Microsoft “access to dedicated GPU infrastructure capacity in tranches at its new data center in Vineland, New Jersey over a five-year term.” The New Jersey data center has a capacity of 300 megawatts. The total contract value through 2031 is $17.4 billion, though, if further capacity is required, the contract value could rise to $19.4 billion.

The deal represents a sizable portion of Microsofts proposed annual capital expenditure on AI, which is expected to reach $120 billion by the end of fiscal 2026.

Nebius and competitor CoreWeave are both on the short list of startups that Nvidia has invested in. Nvidia’s small stake in the former is now worth about $120 million.

Under the agreement, Nebius — which rose from the ashes of Russian tech giant Yandex — will provide Microsoft “access to dedicated GPU infrastructure capacity in tranches at its new data center in Vineland, New Jersey over a five-year term.” The New Jersey data center has a capacity of 300 megawatts. The total contract value through 2031 is $17.4 billion, though, if further capacity is required, the contract value could rise to $19.4 billion.

The deal represents a sizable portion of Microsofts proposed annual capital expenditure on AI, which is expected to reach $120 billion by the end of fiscal 2026.

Nebius and competitor CoreWeave are both on the short list of startups that Nvidia has invested in. Nvidia’s small stake in the former is now worth about $120 million.

President Trump hosts tech executives and their guests to a dinner at the White House in the Oval Office.

Here are the Trump ties among the tech leaders who had dinner at the White House

Many of the attendees have donated to, vocally supported, or even worked for the president.

Latest Stories

Sherwood Media, LLC produces fresh and unique perspectives on topical financial news and is a fully owned subsidiary of Robinhood Markets, Inc., and any views expressed here do not necessarily reflect the views of any other Robinhood affiliate, including Robinhood Markets, Inc., Robinhood Financial LLC, Robinhood Securities, LLC, Robinhood Crypto, LLC, or Robinhood Money, LLC.