Home Technology Artificial Intelligence Perplexity Accused of Stealthily Scraping Websites That Explicitly Blocked Its AI Bots

Perplexity Accused of Stealthily Scraping Websites That Explicitly Blocked Its AI Bots

0

Cloudflare has accused AI search startup Perplexity of bypassing explicit website restrictions by secretly scraping content—even from domains that have blocked its bots via robots.txt files and firewalls. The accusations suggest deliberate circumvention of web standards for data collection.

What Cloudflare Found

  • Perplexity allegedly ignored robots.txt directives blocking its official crawlers (PerplexityBot, Perplexity‑User).
  • When blocked, it reportedly switched to an undeclared bot masquerading as Google Chrome on macOS and rotated through IPs and ASNs to evade detection.
  • Cloudflare detected this behavior spanning tens of thousands of domains, totaling millions of stealth requests per day.

Perplexity’s Response
Perplexity dismissed the claims as misleading, labeling Cloudflare’s blog a “publicity stunt” and arguing they misunderstood how AI assistants operate. The company insists its platform uses only user-driven agents fetching content on-demand—not mass crawling for training data. It also denied that the undeclared crawlers belonged to them. India Today

Historical and Legal Context
This controversy comes amid broader legal and ethical scrutiny:

  • In 2024, Wired and other outlets reported that Perplexity ignored robots.txt and webroom restrictions, triggering an AWS investigation.
  • Dow Jones, the New York Post, and other publishers filed lawsuits over alleged copyright infringement by Perplexity.
  • The BBC has separately issued a formal legal threat, demanding cessation of scraping and deletion of its content.

Industry Implications
Perplexity’s case raises vital questions about how AI firms access and use copyrighted content. Cloudflare’s decisive action—delisting Perplexity’s bots and rolling out mitigation tools—signals a growing movement to enforce web norms and protect publishers. In contrast, OpenAI has been praised for respecting robots.txt and avoiding circumvention.
The dispute underlines the urgency of establishing clearer guidelines for AI content access and ethical web scraping practices.

Conclusion
The Cloudflare accusations highlight escalating tensions between AI platforms and content owners over data usage rights. As AI firms compete for real-time information, Perplexity’s alleged stealth crawling could serve as a precedent—showing what happens when publishers and infrastructure providers push back.

NO COMMENTS

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Exit mobile version