Tech
Boxing Robots
(Getty Images)

Big Tech's onslaught of bots forces publishers to play an impossible game of Whac-A-Mole

In the battle to protect their valuable content from content-thirsty AI scraping bots, publishers have to rely upon a single text file for their defense. 

For thirty years, the humble "robots.txt" file has been used by website owners to alert automated scrapers what content they will allow to be indexed, and what they want to keep out of search engines.

But since tech companies have been racing to ingest as much content as possible to train their AI models, the robots.txt file is also the only place content publishers can use to refuse being scraped and potentially used for AI training — if they know exactly what scrapers to block. Scrapers identify themselves using names like Google’s "googlebot,” Meta's "facebookbot,” or OpenAI's "gptbot,” which appear in the web page request's "user agent" description. 

Publishers must now increasingly play a game of Whac-A-Mole to include new scrapers (like Meta recently let loose) in their robots.txt files to block the new bots as they pop up. Once a site has been scraped for AI training without permission, content owners have little recourse, other than the courts.

Data journalist Ben Welsh's homepages.news project collects automated snapshots of top news websites, as well as the contents of their robots.txt files. In a recent sample of Welsh's data from Aug. 16-17, about 40% of top news sites blocked all scrapers. The most blocked scraper was OpenAI's "gptbot,” with about 24% of the news sites blocking it. Meta's new "Meta-ExternalAgent" bot, which appeared in July was only blocked by around 17% of sites.

Earlier this year, Reuters Institute published a report that found by the end of 2023, 79% of US-based news websites were blocking OpenAI's bot.

The entire mechanism of the robots.txt file is voluntary, and many companies have been caught ignoring them altogether. If a company decides to change the name of their bot, or release a new one without their name in the text, publishers may not know to block it. 

Business Insider reported earlier Wednesday that Tesla was taking up the AI agent mantle as xAI’s similar project stalled, but Musk’s post suggests the initiatives are more intertwined than previously understood.

The collaboration marks the latest example of Musk’s companies working closely together, further blurring the lines between Tesla and the recently merged SpaceX-xAI entity.

tech

Meta doubles down on custom inference chips after reportedly scrapping training chip

Meta said today that it’s expanding its custom silicon development to include four new generations of Meta Training and Inference Accelerator (MTIA) chips. The announcement comes just weeks after The Information reported that the social media company had scrapped its most advanced AI training chip, dubbed Olympus, after facing design challenges. In the meantime, it signed outside chip deals with Nvidiaand Advanced Micro Devices.

Early in its recent conference call, Broadcom CEO Hock Tan sought to reassure investors that the custom chip specialist’s relationship with the social media giant was only getting stronger.

“Now contrary to recent analyst reports, Meta’s custom accelerator MTIA road map is alive and well,” he said. “We’re shipping now.”

The new road map suggests Meta’s in-house chips will focus more on inference, which has more predictable workloads, over training — a technically more demanding area dominated by Nvidia:

“MTIA 300 will be used for ranking and recommendations training, and is already in production. MTIA 400, 450 and 500 will be capable of handling all workloads, but we will primarily use these chips to support GenAI inference production in the near future and into 2027.”

Meta CFO Susan Li told attendees at Morgan Stanley’s tech conference earlier this month that the company “eventually” plans to expand its custom chip design to include training models.

Early in its recent conference call, Broadcom CEO Hock Tan sought to reassure investors that the custom chip specialist’s relationship with the social media giant was only getting stronger.

“Now contrary to recent analyst reports, Meta’s custom accelerator MTIA road map is alive and well,” he said. “We’re shipping now.”

The new road map suggests Meta’s in-house chips will focus more on inference, which has more predictable workloads, over training — a technically more demanding area dominated by Nvidia:

“MTIA 300 will be used for ranking and recommendations training, and is already in production. MTIA 400, 450 and 500 will be capable of handling all workloads, but we will primarily use these chips to support GenAI inference production in the near future and into 2027.”

Meta CFO Susan Li told attendees at Morgan Stanley’s tech conference earlier this month that the company “eventually” plans to expand its custom chip design to include training models.

tech

Google completes acquisition of Wiz — its biggest ever

Today Google said it has completed its $32 billion acquisition of cybersecurity startup Wiz, the largest deal in the company’s history.

“This acquisition is an investment by Google Cloud to improve cloud security and enable organizations to build fast and securely across any cloud or AI platform,” the company wrote in the press release.

The companies agreed to the all-cash purchase last year, after quite a bit of back-and-forth.

Alphabet updated acquisitions chart
Sherwood News
Alphabet updated acquisitions chart
Sherwood News
David Crowther, Claire Yubin Oh
Perp walk
8h
Perplexity has been left behind in the AI wars

Once touted as a potential Google killer, the AI search engine’s traffic has been flat over the last year, while peers like Claude have surged ahead.

Amazon Wins Suit Against Perplexity, Blocking AI Shopping Agent

Latest Stories

Sherwood Media, LLC produces fresh and unique perspectives on topical financial news and is a fully owned subsidiary of Robinhood Markets, Inc., and any views expressed here do not necessarily reflect the views of any other Robinhood affiliate, including Robinhood Markets, Inc., Robinhood Financial LLC, Robinhood Securities, LLC, Robinhood Crypto, LLC, or Robinhood Money, LLC.