Tech
Sam Altman - 11th Breakthrough Prize Ceremony - Arrivals
(Taylor Hill/FilmMagic)
AGENT FOR HIRE

OpenAI’s new “agent” is here to help you buy tuxedos, book luxury hotels, and order batches of cupcakes excruciatingly slowly

The new tool is trained to help execute real-world web-based tasks, but comes with significant risks and caveats.

Jon Keegan

“Agentic AI” is the latest buzzword in tech, as the industry furiously races to monetize the costly technology. Yesterday OpenAI announced its second offering in the emerging category: “ChatGPT Agent,” which comes on the heels of January’s browser-controlling “Operator” tool.

The promise of a superpowered AI agent is that with a simple prompt, your agent can toil away behind the scenes and take care of the busywork you can’t bother yourself with. That could be boring tasks in your day-to-day life, chores, or the things you actually get paid to do for your job.

In a post on X, OpenAI cofounder and CEO Sam Altman said of the new technology:

In the livestream announcing Agent, the OpenAI team demonstrated the tool tackling some of these boring tasks that Agent could spare you from.

For example, youre probably dreading all of the complex planning and research required to attend your dear friends’ wedding in Hawaii. What outfit should I buy? And what gift? Agent has you covered.

Perhaps Agent was trained to solve problems for people with AI researcher pay packages, but after thinking for about 18 minutes (with a break to ask the user a question), Agent came to the rescue to help find a $1,500 Brooks Brothers tuxedo and a nice hotel to stay at in Maui for $4,600 (five nights).

In a demo for Wired, an OpenAI researcher showed how Agent could order a batch of cupcakes — which took it one hour to complete.

The fact that a newly released, bleeding-edge AI tool like Agent takes so long to execute a query isn’t just about wasting your valuable time.

The kind of “reasoning” that Agent is undertaking while executing these queries is among the most computationally expensive of the services that OpenAI offers.

OpenAI describes the advanced “deep research” queries it offers as “very compute intensive,” which the company says is a part of Agent’s capabilities. OpenAI is currently losing money on its $200 per month all-you-can-eat plan for intensive queries.

That could also be a significant amount of water and energy consumed for what would normally be very lightweight tasks when performed by a human.

The company said Agent would be rolling out to ChatGPT Plus, Pro, and Team users yesterday. Pro and Team subscribers get 400 Agent queries per month, and Plus users get 40 per month. (It wasn’t available for my account to try out before I wrote this.)

Striving to Excel

In addition to the ability to sift through your emails and calendars, OpenAI is playing up Agent’s ability to create and edit documents like slide decks and spreadsheets. The announcement highlighted Agent’s superior accuracy when editing spreadsheets “derived from real-world scenarios” compared to Microsoft Copilot using Excel: “ChatGPT agent outperforms existing models by a significant margin.”

Agent (with .xlsx access) scored a 45.5% accuracy rating, while Copilot in Excel scored only 20%. But ChatGPT Agent’s score is actually not great when you consider that humans scored a 71.3%.

“New risk surface”

OpenAI has always offered frank assessments of the risks of its new tools, and Agent is no exception. Choosing to take a “precautionary approach,” OpenAI is treating Agent as having “High Biological and Chemical capabilities” according to its “Preparedness Framework.” That document describes this new “high” capability as:

“The model can provide meaningful counterfactual assistance (relative to unlimited access to baseline of tools available in 2021) to novice actors (anyone with a basic relevant technical background) that enables them to create known biological or chemical threats.”

The “associated risk” of this threshold:

“Significantly increased likelihood and frequency of biological or chemical terror events by non-state actors using known reference-class threats.”

In an email to Sherwood News, OpenAI spokesperson Niko Felix explained that Agent mode is not the default model and users are free to choose to use it at any time. Felix said Agent is trained to explicitly ask users for permissions for tasks with real-world consequences, such as online purchases. And for now, Agent is trained to refuse high-risk tasks like banking.

Felix also cited a caveat in the announcement:

“While ChatGPT agent is already a powerful tool for handling complex tasks, today’s launch is just the beginning. We’ll continue to iteratively add significant improvements regularly, making it more capable and useful to more people over time.”

One of the concerns that OpenAI red teams had was the risk of novel “prompt injections” for Agent that could trick users into sharing personal data or taking actions that they shouldn’t.

In a post on X, Altman said he would caution his own family from using Agent for “high-stakes uses” until the company has had a chance to study it in the wild.

“But we do want people to treat this as a new technology and a new risk surface,” Altman said in the livestream video. “But that said, we hope you’ll love it.”

More Tech

See all Tech
tech
Rani Molla

After Tesla earnings, prediction markets think unsupervised FSD is less likely than ever to be rolled out this year

Tesla’s unsupervised full self-driving technology, which would autonomously ferry passengers around without a human driver having to pay attention, is supposed to help catapult the electric vehicle company’s valuation further into the stratosphere. It was also supposed to be available this year, but prediction markets participants, as well as former Tesla self-driving leaders, no longer think that will happen.

On Teslas earnings call this week, CEO Elon Musk said the company now had “clarity” on achieving unsupervised full self-driving — something he’s repeatedly said would be available at least in some markets this year.

The comments seemed to give Polymarket prediction markets participants some clarity. There, the market-implied probability that Tesla will release unsupervised FSD this year reached its lowest point since the event contract was opened in May.

The odds of it happening had been pretty high up until late June, when Tesla’s long-awaited robotaxi launched with a safety driver in the passenger seat. The unsupervised FSD event contract specifies the feature can have “no requirement for human intervention.”

tech
Rani Molla

Banks prepare record $38 billion debt financing to fund Oracle-tied data centers

Banks led by JPMorgan and Mitsubishi UFJ are preparing a $38 billion debt offering to fund two Oracle-tied data centers in Texas and Wisconsin, Bloomberg reports. The projects, developed by Vantage Data Centers, will support Oracle’s $500 billion Stargate AI infrastructure push with OpenAI and Nvidia.

The loans — $23.25 billion for Texas and $14.75 billion for Wisconsin — are expected to mature in four years, price about 2.5 percentage points higher than the benchmark rate, and mark the largest AI infrastructure financing to date.

Oracle executives recently said that the company anticipates cloud gross margins will reach 35% and that it expects to see $166 billion in cloud infrastructure revenue by FY 2030.

Oracle is up 1.5% premarket.

The loans — $23.25 billion for Texas and $14.75 billion for Wisconsin — are expected to mature in four years, price about 2.5 percentage points higher than the benchmark rate, and mark the largest AI infrastructure financing to date.

Oracle executives recently said that the company anticipates cloud gross margins will reach 35% and that it expects to see $166 billion in cloud infrastructure revenue by FY 2030.

Oracle is up 1.5% premarket.

tech
Rani Molla

Google rises on official announcement of Anthropic deal worth “tens of billions”

Google has made its deal to expand AI compute to Anthropic, reported earlier this week by Bloomberg, official. In order to train and serve its Claude model, Anthropic has agreed to pay Google Cloud “tens of billions of dollars” to access up to 1 million tensor processing units, or TPUs, as well as other cloud services.

Google, of course, has a 14% stake in Anthropic, making this one of the many circular AI deals happening at the moment.

“Anthropic and Google have a longstanding partnership and this latest expansion will help us continue to grow the compute we need to define the frontier of AI,” Anthropic CFO Krishna Rao said in the press release. “Our customers — from Fortune 500 companies to AI-native startups — depend on Claude for their most important work, and this expanded capacity ensures we can meet our exponentially growing demand while keeping our models at the cutting edge of the industry.”

The announcement has sent Google up again, more than 1% premarket.

tech
Rani Molla

Report: Snap seeking $1 billion to finance its AR glasses division in “existential” fundraise

Snap is down more than 1% this morning following news that the company is attempting to raise $1 billion for its AR glasses unit in what someone told Sources.news was an “existential” fundraise.

A Snap spokesperson countered, “We do not need to raise money to execute against our plans to publicly launch Specs in 2026, but remain open to opportunities that could accelerate our growth.”

Multiple investors are involved in the talks, including Saudi Arabia’s Public Investment Fund, according to Sources.news. The report also noted that Snap plans to turn the unit that makes its Specs glasses into an independent subsidiary à la Google’s Waymo “that can continue raising capital from investors.”

Snap plans to produce about 100,000 units of next year’s Specs, pricing them around $2,500.

The beleaguered stock saw quite a bit of retail interest last month, amid r/WallStreetBets chatter that its low nominal price made it a potential acquisition target.

Multiple investors are involved in the talks, including Saudi Arabia’s Public Investment Fund, according to Sources.news. The report also noted that Snap plans to turn the unit that makes its Specs glasses into an independent subsidiary à la Google’s Waymo “that can continue raising capital from investors.”

Snap plans to produce about 100,000 units of next year’s Specs, pricing them around $2,500.

The beleaguered stock saw quite a bit of retail interest last month, amid r/WallStreetBets chatter that its low nominal price made it a potential acquisition target.

Latest Stories

Sherwood Media, LLC produces fresh and unique perspectives on topical financial news and is a fully owned subsidiary of Robinhood Markets, Inc., and any views expressed here do not necessarily reflect the views of any other Robinhood affiliate, including Robinhood Markets, Inc., Robinhood Financial LLC, Robinhood Securities, LLC, Robinhood Crypto, LLC, or Robinhood Money, LLC.