Tech
Brain in a Bubble
Brain in a Bubble
illusion of thinking

With WWDC on deck, Apple says “reasoning” AI models collapse with complexity

Apple tested state-of-the-art “chain of thought” models and found that they aren’t “reasoning,” but merely pattern matching, calling into question the direction the industry is taking.

Jon Keegan
6/9/25 10:34AM

Apple’s troubled AI rollout was plagued by a series of remarkable feature failures and product delays.

What was supposed to be the year of “Apple Intelligence” has failed to deliver an AI-enhanced Siri on par with voice assistants from competitors like Google, OpenAI, and Meta. This week, all eyes are on Apple as it holds its Worldwide Developers Conference (WWDC) to see what it’s planning to get back in the AI race.

But behind the scenes, researchers at Apple have been digging into the competition’s latest and greatest “reasoning” models to see how they respond to tricky challenges as they scale in complexity.

In a new paper, Apple’s researchers found that the leading state-of-the-art “chain of thought” models “face a complete accuracy collapse” when they dialed up the complexity of puzzle-based tests. The spectacular failures of the models led the researchers to question their “reasoning” label, calling it instead “the illusion of thinking.”

The suite of tests included puzzles like “Tower of Hanoi,” in which the player must stack a series of disks of various sizes from one post to another, one disk at a time, only moving the top disk, and always placing smaller disks on larger ones.

Screenshot from apple “Illusion of Thinking” paper
A figure from the “Illusion of Thinking” Apple paper showing models’ collapse in accuracy as the complexity is dialed up. (Source: Apple)

While the models could solve the simplest versions of the puzzles, they fell on their face once things got more complex. The research tested reasoning models DeepSeek-R1, OpenAI’s o3-mini, and Anthropic’s Claude 3.7 Sonnet Thinking.

Chain of “thought”

After hitting performance plateaus from the “more data, more compute” approach, the industry followed OpenAI’s o1 release and started to build “chain of thought” reasoning models, which showed their “thought” processes.

This technique did boost the performance of large language models to new levels, offering a promising new pathway out of what looked to be a computational dead end. While they required vastly higher computation resources and time, the approach seemed to be the way forward.

Apple’s research seems to show that rather than reasoning, these models are merely displaying sophisticated pattern matching.

Apple researchers also examined the “thought” processes behind each solution to the puzzle, to better understand exactly how the models approached solutions.

The fact of the matter is that very little is known about how these recent models actually work. It remains to be seen if Apple has been cooking up an alternate approach, but reports indicate an AI-enhanced Siri isn’t likely to make a debut at this week’s WWDC.

More Tech

See all Tech
APPLE INTELLIGENCE

Apple AI was MIA at iPhone event

A year and a half into a bungled rollout of AI into Apple’s products, Apple Intelligence was barely mentioned at the “Awe Dropping” event.

tech

Oracle’s massive sales backlog is thanks to a $300 billion deal with OpenAI, WSJ reports

OpenAI has signed a massive deal to purchase $300 billion worth of cloud computing capacity from Oracle, according to a report from The Wall Street Journal.

The report notes that the five-year deal would be one of the largest cloud computing contracts ever signed, requiring 4.5 gigawatts of capacity.

The news is prompting shares to pare some of their massive gains, presumably because of concerns about counterparty and concentration risk.

Yesterday, Oracle shares skyrocketed as much as 30% in after-hours trading after the company forecast that it expects its cloud infrastructure business to see revenues climb to $144 billion by 2030.

Oracle shares were up as much as 43% on Wednesday.

It’s the second example in under a week of how much OpenAI’s cash burn and fundraising efforts are playing a starring role in the AI boom: the Financial Times reported that OpenAI is also the major new Broadcom customer that has placed $10 billion in orders.

Yesterday, Oracle shares skyrocketed as much as 30% in after-hours trading after the company forecast that it expects its cloud infrastructure business to see revenues climb to $144 billion by 2030.

Oracle shares were up as much as 43% on Wednesday.

It’s the second example in under a week of how much OpenAI’s cash burn and fundraising efforts are playing a starring role in the AI boom: the Financial Times reported that OpenAI is also the major new Broadcom customer that has placed $10 billion in orders.

Large companies have started to drop AI from their businesses

Census data shows drop in large companies using AI

AI appears to be everywhere, but that doesn’t mean big companies have fully embraced the use of the technology in their day-to-day business.

tech

Report: Microsoft adds Anthropic alongside OpenAI in Office 365, citing better performance

In a move that could test its fraught $13 billion partnership, Microsoft is moving away from relying solely on OpenAI to power its AI features in Office 365 and will now also include Anthropic’s Claude Sonnet 4 model, according to a report from The Information.

The move is a tectonic shift that boosts Anthropic’s standing, heightens risks for OpenAI, and has huge ramifications for the balance of power in the fast-moving AI field.

Per the report, Microsoft executives found that Anthropic’s AI outperformed OpenAI’s on tasks involving spreadsheets and generating PowerPoint slide decks, both crucial parts of Microsoft’s Office 365 productivity suite.

Microsoft will have to pay the competition to provide the services —Amazon Web Services currently hosts Anthropic’s models while Microsoft’s Azure cloud service does not, The Information reported.

OpenAI is also reportedly working on its own productivity suite of apps.

The move is a tectonic shift that boosts Anthropic’s standing, heightens risks for OpenAI, and has huge ramifications for the balance of power in the fast-moving AI field.

Per the report, Microsoft executives found that Anthropic’s AI outperformed OpenAI’s on tasks involving spreadsheets and generating PowerPoint slide decks, both crucial parts of Microsoft’s Office 365 productivity suite.

Microsoft will have to pay the competition to provide the services —Amazon Web Services currently hosts Anthropic’s models while Microsoft’s Azure cloud service does not, The Information reported.

OpenAI is also reportedly working on its own productivity suite of apps.

tech

Apple announces extra slim iPhone Air, iPhone Pro with longer battery life, updated AirPods Pro 3 with live language translation, and refreshed Apple Watch line

At todays Awe Dropping Apple event, the company announced its yearly refresh of the iPhone lineup. The new iPhone 17, iPhone 17 Pro, and iPhone 17 Pro Max were joined by a brand-new addition: the iPhone Air, a superthin model with tougher glass and faster processors.

Apple shares dipped on news of the product releases and are down about 1.4% on the day in afternoon trading.

The company also announced an updated Apple Watch line — Series 11, SE3, and Ultra 3 — with new features like 5G, high blood pressure detection, 24-hour battery life, and satellite communication. 

Apple iPhone 17
Apple’s iPhone 17 (Apple)

Here’s a breakdown of the new products Apple announced:

  • The ultrathin iPhone Air was described by Apple as “a paradox you have to hold to believe.” The sleek 5.6-millimeter-thin iPhone features a crack- and scratch-resistant front and back and “Macbook Pro levels of compute,” which you can pair with a weird $59 cross-body strap. It starts at $999.

  • The iPhone 17 has a faster A19 chip, an improved smart selfie camera, and a higher-resolution screen. It starts at $799.

  • The iPhone 17 Pro has a new design, ever-faster A19 Pro chip, a tougher ceramic shield on the front and back, better cameras, and a bigger battery that gets an extra 10 hours of video playback compared to its predecessor. It costs $100 more than the previous generation, but the minimum storage has doubled to 256 gigabytes. It starts at $1,099.

  • The iPhone 17 Pro Max starts at $1,199.

  • The AirPods Pro 3 have AI-powered live translation, a new heart rate sensor, eight hours of battery life, and improved active noise cancellation. The new AirPods can also track workouts, and Apple says they are built to fit more people’s ears with a new design and foam ear tips. They start at $249.

  • The Apple Watch Series 11 has 5G, a new high blood pressure detection feature, improved sleep tracking, a more scratch-resistant face, and 24 hours of battery life.

  • The entry-level Apple Watch SE 3 gets 5G, new health-tracking features, and an always-on display. It starts at $249.

  • The chunky Apple Watch Ultra 3 has an impressive 42-hour battery life, satellite communications for emergencies, and a brighter and bigger display. It starts at $799.

Latest Stories

Sherwood Media, LLC produces fresh and unique perspectives on topical financial news and is a fully owned subsidiary of Robinhood Markets, Inc., and any views expressed here do not necessarily reflect the views of any other Robinhood affiliate, including Robinhood Markets, Inc., Robinhood Financial LLC, Robinhood Securities, LLC, Robinhood Crypto, LLC, or Robinhood Money, LLC.