Tech
Mimeograph Machine
Getty Images
Platformer

AI “plagiarism engines” like Perplexity cannot be the future of the web

We can still have the internet we want — but we have to try new business models.

Casey Newton

I.

For a while now, I’ve been gloomy about the state of the web. Plagiarism engines like Perplexity and Arc Search have attracted millions of users by ripping off other people’s work, depriving publishers of the traffic and advertising revenue that once sustained them. The results have been successful enough that Google is following them.

Today, I want to talk about a more positive vision for the future of the internet — one where AI companies and creators work hand in hand to grow the web again, sharing the wealth they create with one another.

Before I get there, though, it’s worth taking a moment to reflect on how bad the status quo has gotten.

Earlier this month, Forbes noticed that Perplexity had been stealing its journalism. The AI startup had taken a scoop about Eric Schmidt’s new drone project and repurposed it for its new “pages” product, which creates automated book-report style web pages based on user prompts. Perplexity had apparently decided to take Forbes’ reporting to show off what its plagiarism can do.

Here’s Randall Lane, Forbes’ chief content officer, in a blog post.

“Not just summarizing (lots of people do that), but with eerily similar wording, some entirely lifted fragments — and even an illustration from one of Forbes’ previous stories on Schmidt,” noted “More egregiously, the post, which looked and read like a piece of journalism, didn’t mention Forbes at all, other than a line at the bottom of every few paragraphs that mentioned “sources,” and a very small icon that looked to be the “F” from the Forbes logo – if you squinted. [...]

Perplexity then sent this knockoff story to its subscribers via a mobile push notification. It created an AI-generated podcast using the same (Forbes) reporting — without any credit to Forbes, and that became a YouTube video that outranks all Forbes content on this topic within Google search. 

Any reporter who did what Perplexity did would be drummed out of the journalism business.

Any reporter who did what Perplexity did would be drummed out of the journalism business. But CEO Aravind Srinivas attributed the problem here to “rough edges” on a newly released product, and promised attribution would improve over time. “We agree with the feedback you've shared that it should be a lot easier to find the contributing sources and highlight them more prominently,” he wrote in an X post.

In person, Srivinas can come across as earnest and a bit naive, as I learned when he came on Hard Fork in February. But any notion that Perplexity’s problems stem from a simple misunderstanding was dashed this week when Wired published an investigation into how the company sources answers for users’ queries. In short, Wired found compelling evidence that Perplexity is ignoring the Robots Exclusion Protocol, which publishers and other websites use to grant or deny permissions to automated crawlers and scrapers. 

Here are Dhruv Mehrotra and Tim Marchman:

Until earlier this week, Perplexity published in its documentation a link to a list of the IP addresses its crawlers use—an apparent effort to be transparent. However, in some cases, as both Wired and Knight were able to demonstrate, it appears to be accessing and scraping websites from which coders have attempted to block its crawler, called Perplexity Bot, using at least one unpublicized IP address. The company has since removed references to its public IP pool from its documentation. [...]

Wired verified that the IP address in question is almost certainly linked to Perplexity by creating a new website and monitoring its server logs. Immediately after a Wired reporter prompted the Perplexity chatbot to summarize the website's content, the server logged that the IP address visited the site. This same IP address was first observed by Knight during a similar test.

Forbes sent Perplexity a cease-and-desist letter, and I imagine it won’t be the last publisher to do so. There are open legal questions about whether copyrighted material can be used to train large language models or answer chatbot queries, but I see no legal way Perplexity can get away with one of its other core techniques for building pages: using copyrighted images from Getty, the Wall Street Journal, Forbes and others. You simply are not allowed to re-publish other people’s copyrighted photos and illustrations without permission, even if your plagiarism engine is new and has “rough edges.”

Perhaps Perplexity will clean up its act; once it came under fire, the company ran to Semafor to promise that it is “working on” deals with publishers. In the meantime, though, I’ve come to think of it as the Clearview AI of generative artificial intelligence companies: scraping billions of pieces of data without permission and daring courts to stop it. 

Like Clearview, Perplexity’s core innovation is ethical rather than technical. In the recent past, it would have been considered bad form to steal and repurpose journalism at scale. Perplexity is making a bet that the advent of generative AI has somehow changed the moral calculus to its benefit. 

“I think we need to work together to build all these things, rather than trying to see it as, hey, like you’re taking my stuff and using it,” Srinivas told us in February. 

But then he just kept taking everyone’s stuff and using it. The working together part, I guess, is meant to come later.

II.

One path forward for the web, as I shared on a recent episode of Search Engine, is the Fediverse. Decentralized, federated apps; portable identities and follower graphs; permissionless innovation on open protocols: this is a way journalists can once again begin to build audiences — stable ones! — rather than simply courting traffic. This is a years-long project, and I can only barely see the outlines of it taking shape. But it’s an appealing alternative to a world where all content is subsumed into a large language model and accessed by an opaque and proprietary set of algorithms. 

But this is a long-term solution, and a partial one. And it carries with it the embedded assumption that today’s AI systems cannot be reshaped in ways that actually grow the web, and pay for the labor of the people who make it. The Fediverse is about giving up on the consumer internet as we know it today — the big walled gardens, the metastasizing LLMs — and trying to build something different.

Tim O’Reilly is thinking differently. As a publisher, investor, and open source advocate, O’Reilly sits at the intersection of many of the business problems and opportunities presented by AI. On Tuesday, he offered his solution to parasitic companies like Perplexity: developing new business models for AI companies that pay creators based on the amount of material that the companies use.

O’Reilly is starting with his own publishing business, sharing a portion of subscription revenue with (or paying a fixed fee to) authors when it uses AI to generate summaries, test questions, translations, or other derivative works based on their writing. 

He concludes:

When someone reads a book, watches a video, or attends a live training, the copyright holder gets paid. Why should derivative content generated with the assistance of AI be any different? Accordingly, we have built tools to integrate AI-generated products directly into our payment system. This approach enables us to properly attribute usage, citations, and revenue to content and ensures our continued recognition of the value of our authors’ and teachers’ work.

And if we can do it, we know that others can too.

To O’Reilly, this view of AI is a natural extension of the modern web, which is built on what he calls an “architecture of participation.” The earlier web consisted of giant walled gardens like AOL and MSN, which sought to keep as much activity within their own borders as possible. In this view, companies like Google, OpenAI, and Perplexity are all competing to become the next AOL. It is a vision in which most of the benefits of AI are reaped by a very small number of companies.

“Only the most short-term of business advantage can be found by drying up the river AI companies drink from.”

But this would be a mistake, he writes, if only because the current AI business models are ultimately self-defeating. “If the long-term health of AI requires the ongoing production of carefully written and edited content — as the currency of AI knowledge certainly does — only the most short-term of business advantage can be found by drying up the river AI companies drink from,” O’Reilly writes. “Facts are not copyrightable, but AI model developers standing on the letter of the law will find cold comfort in that if news and other sources of curated content are driven out of business.”

We know that AI companies are running out of data to train their frontier models on. Given that fact, it seems ludicrous that companies like Perplexity are building systems that all but ensure they will have less data to train on in the future.

O’Reilly is taking the opposite approach. And while it remains to be seen whether the average writer on his platform benefits meaningfully from AI royalties, if nothing else he has gotten the incentive structure right. Pay people to create high-quality writing and other content; use that content with permission to train powerful AI systems; and share the wealth that those systems create to fund and incentivize the production of further high-quality writing.

If Srinivas meant it when he said he “we need to work together to build all these things,” he can now look to O’Reilly for a powerful example of what working together actually looks like.


Casey Newton writes Platformer, a daily guide to understanding social networks and their relationships with the world. This piece was originally published on Platformer.

More Tech

See all Tech
tech

Anthropic’s Claude Opus 4.6 gains financial research, improved coding features

It’s a model-for-model battle between OpenAI and Anthropic, as the startups vie for dominance in AI coding tools.

Not to be outdone by OpenAI’s release today of GPT-5.2-Codex, Anthropic has released a new model that also improves its coding skills: Claude Opus 4.6.

According to the release, the new model now has the ability to perform financial research, adding new utility to its Claude Cowork tool, which recently gained new legal work capabilities that made investors bet against established software companies. This time, the news is sinking financial research firms like FactSet and S&P Global.

Claude Opus 4.6 can help with longer, more complex coding projects and perform more detailed debugging and code review tasks. It also features improvements in its ability to work with documents, spreadsheets, and presentations.

Anthropic says the new model made strides in safety as well, showing extremely low rates of “misaligned behavior.”

According to the release, the new model now has the ability to perform financial research, adding new utility to its Claude Cowork tool, which recently gained new legal work capabilities that made investors bet against established software companies. This time, the news is sinking financial research firms like FactSet and S&P Global.

Claude Opus 4.6 can help with longer, more complex coding projects and perform more detailed debugging and code review tasks. It also features improvements in its ability to work with documents, spreadsheets, and presentations.

Anthropic says the new model made strides in safety as well, showing extremely low rates of “misaligned behavior.”

tech

OpenAI releases its answer to Claude Code, first AI model with “high capability” risk for cybersecurity

AI agents that can write code have quickly become one of the most profitable, and competitive, applications coming from the current crop of AI startups.

Anthropic’s Claude Code is enjoying a moment of popularity among software engineers, and it’s shoring up the startup’s revenue projections as it aims for an IPO this year. Claude Code’s launch, along with Anthropic’s release of Claude Cowork, which is aimed at nontechnical users, has been a key force behind software stocks’ massive recent underperformance.

Today OpenAI released its latest salvo in the AI code war: GPT-5.3-Codex, an “agentic coding” model that takes its name from OpenAI’s Codex coding app.

OpenAI says that GPT-5.3-Codex is the first model that was “instrumental in creating itself.”

According to the announcement, the new model can be used to build complex websites, interactive games, and achieved a new industry-wide high score on the widely used SWE-Bench Pro software development benchmark test.

But the model is also the first that OpenAI has released that comes with a “high capability” risk for cybersecurity, meaning the company’s evaluations showed that the tool had the potential to be used for sophisticated cyberattacks, though OpenAI says it has added mitigations to prevent such misuse.

Today OpenAI released its latest salvo in the AI code war: GPT-5.3-Codex, an “agentic coding” model that takes its name from OpenAI’s Codex coding app.

OpenAI says that GPT-5.3-Codex is the first model that was “instrumental in creating itself.”

According to the announcement, the new model can be used to build complex websites, interactive games, and achieved a new industry-wide high score on the widely used SWE-Bench Pro software development benchmark test.

But the model is also the first that OpenAI has released that comes with a “high capability” risk for cybersecurity, meaning the company’s evaluations showed that the tool had the potential to be used for sophisticated cyberattacks, though OpenAI says it has added mitigations to prevent such misuse.

tech

Google’s Gemini is gaining but OpenAI’s ChatGPT is still the AI chatbot leader

Following Alphabet’s stellar earnings report Wednesday, analysts were quick to declare that the Google parent had blossomed from an AI laggard into a leader. The company posted strong revenue and profit growth, driven in part by heavy investment in artificial intelligence, and noted that its Gemini app had grown to more than 750 million monthly active users.

Still, usage data suggests Gemini remains far behind the market leader — at least as far as usage.

While Gemini is growing faster than OpenAI’s ChatGPT — up 19% month over month versus 4% — it still trails by a wide margin in overall usage. In January, Gemini logged more than 2 billion global visits, according to new data from Similarweb, less than half of ChatGPT’s 5.7 billion.

tech

OpenAI’s Altman calls Anthropic an “authoritarian company” and says its Super Bowl ad is “deceptive”

Yesterday, Anthropic announced that it intends (for now) to keep its Claude chatbot free of ads. Competitors OpenAI, xAI, Meta, and Google all have expressed plans for ads in some form for their respective AI chatbots.

Anthropic also released cheeky ads depicting scenarios where people are asking questions to a personified version of their AI chatbot, only to recoil in confusion when the response transforms into a creepy ad.

It’s pretty clear that Anthropic was poking fun at the market-leading AI chatbot, ChatGPT. The characters playing the chatbot had the pitch-perfect tone of an eager-to-please ChatGPT session.

OpenAI CEO Sam Altman tried to be a good sport, calling the ads funny, but clearly they struck a nerve, prompting a 400-word post on X in which he called the ads “deceptive,” accused Anthropic of “doublespeak,” and said it was an “authoritarian company” that was heading down a “dark path.”

Altman pushed back on the depiction of how such creepy ads could show up in chats, saying that OpenAI has pledged to never weave ads into chat conversations, knowing it users would reject that.

Previewing how the rival AI startups might battle each other in the marketplace, Altman attacked Anthropic’s focus on paid subscription, rather than generous limits for free users (which appears to be working out pretty well for Anthropic):

“Anthropic serves an expensive product to rich people. We are glad they do that and we are doing that too, but we also feel strongly that we need to bring AI to billions of people who can’t pay for subscriptions.”

Both companies are racing to launch an IPO this year, which will only raise the stakes for this billionaire beef.

Latest Stories

Sherwood Media, LLC produces fresh and unique perspectives on topical financial news and is a fully owned subsidiary of Robinhood Markets, Inc., and any views expressed here do not necessarily reflect the views of any other Robinhood affiliate, including Robinhood Markets, Inc., Robinhood Financial LLC, Robinhood Securities, LLC, Robinhood Crypto, LLC, or Robinhood Money, LLC.