(Arc Prize)

The toughest AI benchmark just got a whole lot tougher

ARC-AGI-3 is the latest version of a clever benchmark that challenges AI models to solve mini video games with no written instructions.

Jon Keegan

The flood of new AI models with increasingly advanced “reasoning” capabilities is forcing the AI industry to abandon early benchmark tests and invent new ones to test for many skills.

To watch the evolution of one such test — ARC-AGI — is to witness the huge technical leaps that today’s generative-AI models have made in a few short years. Tech CEOs brag about their models’ high scores on ARC-AGI, as it is widely considered one of the most unique and difficult AI benchmarks in use today.

Rather than testing how well a model can translate an inscription on an ancient Roman tombstone, or offer a diagnosis for a complex medical case, ARC-AGI challenges AI models to analyze abstract geometric puzzles and games without any written instructions. This ensures that the models are forced to create solutions to complex multistep problems, rather than regurgitate text from their training.

We created an in-house game studio and built 135 novel environments from scratch

No instructions, Core Knowledge Priors-only

In order to beat these games, AI must:
• Explore the environment
• Form hypotheses
• Execute a plan
• Learn and adapt pic.twitter.com/oaSVhut7Cp
— ARC Prize (@arcprize) March 25, 2026

The latest version that just launched, ARC-AGI-3, is basically a collection of mini games, which the user can play by moving simple shapes through a pixelated game board using directional arrows. As designed, the games are easy for humans to figure out after a few minutes of experimentation, but incredibly difficult for computers to solve.

One of the fascinating new features of the latest version is a replay mode that lets human observers read through AI models’ “chain of thought” transcript to see how a model breaks down the problem and attempts a solution.

Humans can play through these games on the project’s website. For now it seems humans don’t have much to worry about.

The most capable state-of-the-art models in the wild haven’t even cracked a 1% score (out of 100). The current leaderboard for ARC-AGI-3 shows OpenAI’s GPT-5.4 in the lead at 0.3%, and tied for second place are Anthropic’s Opus 4.6 and Google’s Gemini 3.1 Pro. xAI’s Grok 4.20 Reasoning model got a 0%.

Rani Molla3h

The US leads the world in robotaxi deployments

Every day it seems another robotaxi launches somewhere in the world. But most of them are in the US.

Of the 171 active robotaxi deployments globally, 69 — or 40% — are in the US, according to a new report from the Bank of America Institute. China, the next largest market, accounts for 24% of deployments.

Most of those deployments are still in testing or early commercial stages. Only 10 US cities currently have fully commercial robotaxi operations, defined as services that operate on public roads, carry paying passengers, run fully driverless without a safety driver, and function all day in any weather.

For now, that effectively refers to Alphabet’s Waymo, which operates commercially in Atlanta, Austin, Dallas, Houston, Los Angeles, Miami, Orlando, Phoenix, San Antonio, and the San Francisco Bay Area. That definition excludes competitors like Tesla, whose Robotaxi service uses safety monitors, and Amazon’s Zoox, which has yet to charge customers for rides.

Millie Giles

overfeed

Wait, just how addicted to Facebook and YouTube are Americans today?

While a jury found both Meta and Alphabet have designed addictive apps, usage among US adults has dropped off.

Justice Gavel with Floating Social Media Like Icons and Hearts – Digital Justice and Online Engagement

Jon Keegan4h

Sen. Sanders and Rep. Ocasio-Cortez introduce data center moratorium bill

Tapping into the growing public pushback surrounding the data center construction boom, Sen. Bernie Sanders, I-Vt., and Rep. Alexandria Ocasio-Cortez, D-NY, have announced The AI Data Center Moratorium Act of 2026. The bill calls for a halt to new data center construction until federal legislation regulating AI is enacted.

Lawmakers in at least 11 states have proposed pauses on data center construction, according to the National Conference of State Legislatures.

In addition to halting new data center construction, the bill also calls for banning the export of advanced AI chips to countries that lack regulations that protect against harms from AI.

In a press release, Ocasio-Cortez said:

“Congress has a moral obligation to stand with the American people and stop the expansion of these data centers until we have a framework to adequately address the existential harm AI poses to our society. We must choose humanity over profit.”

Heading into the midterm elections, data centers are starting to emerge as a political issue following a growing list of projects that have been scuttled due to community opposition. President Trump has pushed AI companies to voluntarily pledge to “pay their way” for the massive energy requirements of data centers.

Sanders and AOC unveil data center moratorium bill

Lawmakers in at least 11 states have proposed pauses on data center construction, according to the National Conference of State Legislatures.

In addition to halting new data center construction, the bill also calls for banning the export of advanced AI chips to countries that lack regulations that protect against harms from AI.

In a press release, Ocasio-Cortez said:

“Congress has a moral obligation to stand with the American people and stop the expansion of these data centers until we have a framework to adequately address the existential harm AI poses to our society. We must choose humanity over profit.”

Jon Keegan5h

RIP ChatGPT-XXX

OpenAI has shelved its planned ChatGPT “adult mode” indefinitely, according to a report from the Financial Times. The startup is in the midst of an internal effort to eliminate many of its “side quests” to focus on enterprise features like coding and productivity tools.

Earlier this week, the company announced it was killing one of those side quests: its video-generation app, Sora.

Per the report, investors and staff raised concerns that offering an erotica-generating AI model doesn’t exactly align with the company’s stated mission to “ensure that artificial general intelligence benefits all of humanity.”

As other tech companies deal with the legal consequences of failing to protect minors from their products, OpenAI would be choosing a potentially dangerous path with such an X-rated feature, while its main competitor, Anthropic, continues to make inroads among enterprise customers by improving on coding and spreadsheet skills.