By mid-2025, approximately 35% of new websites will have been created fully or partially using artificial intelligence, according to researchers at Stanford University.
Before the public launch of ChatGPT by OpenAI in November 2022, this figure was at zero. Over the past few years, the share of AI-generated content has risen to more than a third of recent online publications.
Researchers analyzed 33 months of archived website copies from the Wayback Machine using the Pangram v3 detector. Their goal was to understand how the growth of AI-generated texts is reshaping the structure of the World Wide Web.
Main Changes
The researchers noted a decrease in semantic diversity. Pages generated by neural networks are 33% more similar to each other than texts written by humans. Different websites increasingly retell the same ideas using nearly identical phrases.
According to the authors, this issue goes beyond mere mass AI copywriting. The problem is deeper: the variety of expressions and ideas is gradually narrowing. Large language models (LLMs) inherently choose the most "average" responses, resulting in a repetitive discourse.
The emotional tone of publications has also changed. AI-generated content is 107% more positive than human-written content. Stanford researchers linked this to the already documented tendency of LLMs to be overly agreeable.
During training, developers optimize neural networks for pleasant, safe, and socially acceptable responses. Consequently, a significant portion of new websites creates a "sterile, friendly" information environment. This results in fewer harsh judgments and conflicts, but also less vibrant human debate.
What Wasn't Confirmed
Several popular concerns did not find statistical support. Researchers did not find a significant correlation between the rise of AI content and a decrease in factual accuracy, an increase in explicit errors, or a stylistic alignment of texts to a single template.
Researchers specifically pointed out an effect that has mostly been discussed theoretically — model collapse.
If new neural networks are trained on data containing a lot of AI content, the system begins to process its own averaged responses. This reduces variability, degrades quality, and risks future LLMs learning not from humans but from a "synthetic echo" of their predecessors.
Experts, along with the Internet Archive, plan to turn this research into a system for ongoing monitoring of the share of AI content on the internet.
Recall that in mid-April, Stanford University noted the rapid pace of artificial intelligence development. Researchers reported that neural networks are nearly on par with humans in performing tasks on computers.
