|
|
August 21, 2025
|
Hackers Infiltrate Alleged North Korean Operative’s Computer, Leak Evidence of...
|
August 21, 2025
|
Ecosia Proposes Unusual Stewardship Model for Google Chrome
|
August 21, 2025
|
OpenAI Presses Meta for Evidence on Musk’s $97 Billion Takeover Bid
|
August 15, 2025
|
ChatGPT Mobile App Surpasses $2 Billion in Consumer Spending, Dominating Rivals
|
|
|
Cloudflare Accuses Perplexity of Evading Scraping Blocks on Thousands of Websites
August 4, 2025
AI startup Perplexity is under fire after internet infrastructure provider Cloudflare published evidence accusing the company of intentionally bypassing anti-scraping protections on websites that had explicitly opted out of being crawled.
According to Cloudflare’s research released Monday, the AI company obscured its identity and actively circumvented common website safeguards, including robots.txt files and bot-blocking rules. The alleged activity involved spoofing browser information and rotating network identifiers to appear legitimate while scraping content — behavior Cloudflare described as “an attempt to circumvent the website’s preferences.”
“This activity was observed across tens of thousands of domains and millions of requests per day,” Cloudflare stated, adding that it was able to fingerprint the activity using machine learning and network analysis.
Perplexity, which builds AI tools dependent on vast amounts of online data, denied the allegations. Company spokesperson Jesse Dwyer dismissed Cloudflare’s blog post as a “sales pitch,” claiming the screenshots presented showed “no content was accessed.” Dwyer further stated that the bot identified in the post “isn’t even ours.”
Cloudflare, however, claims it validated the scraping through internal testing after receiving complaints from customers whose sites were being accessed despite having protections in place. The company said Perplexity used not only its known bot identifiers but also generic browser headers mimicking Google Chrome to sneak past defenses — a tactic typically used by scrapers trying to remain undetected.
In response, Cloudflare has removed Perplexity’s crawlers from its list of verified bots and deployed new protections to help sites block similar behavior.
The dispute highlights growing tensions between AI companies and internet infrastructure providers over how online content is accessed and used. AI startups often train their models using massive datasets pulled from the web — sometimes without the explicit consent of content owners — prompting backlash from publishers, researchers, and digital rights advocates.
Cloudflare, which has increasingly taken a stance against unauthorized AI scraping, recently launched a marketplace for websites to monetize AI crawler access and unveiled tools to block unwanted data harvesting. CEO Matthew Prince has warned that generative AI poses a threat to the internet’s economic model, particularly for publishers.
This isn’t the first time Perplexity has been accused of stepping over ethical or legal lines. Last year, media outlets including Wired alleged the company was reproducing their content without permission. In a high-profile interview at Disrupt 2024, Perplexity CEO Aravind Srinivas was pressed on the company’s handling of intellectual property and struggled to define what it considered plagiarism.
As the debate over data ownership and AI scraping intensifies, cases like this may set important precedents for how the web, and the companies building on top of it, will coexist in the AI era.
|
|
|
Sign Up to Our Newsletter!
Get the latest news in tech.
|
|
|