Home » Emerging Technologies » Cyber Security » Cloudflare Reveals Stealth Crawling by Perplexity AI
News Desk -

Share

Cloudflare has announced it is observing stealth crawling behavior from Perplexity, an AI-powered answer engine. Initially, Perplexity AI crawls using their declared user agent. However, when blocked by a network, they appear to hide their identity to bypass website restrictions.

Cloudflare revealed that Perplexity repeatedly changes its user agent and source ASNs. This tactic hides their crawling activity. The AI crawler also ignores or sometimes fails to fetch robots.txt files. For over 30 years, the internet has been built on trust. Crawlers are expected to be transparent, serve a clear purpose, and follow website directives. Cloudflare reported that Perplexity’s behavior contradicts these expectations. As a result, Cloudflare has de-listed Perplexity as a verified bot. The company added new managed rules to block this stealth crawling.

Cloudflare explained how well-meaning bots should behave. They must identify themselves clearly using unique user agents, IP ranges, or Web Bot Auth integration. Bots should not flood sites with traffic or scrape sensitive data stealthily. Each bot should serve a distinct purpose. It is important that bots respect robots.txt rules and rate limits.

Users can protect themselves with Cloudflare’s bot management system. Any undeclared crawling by Perplexity is scored as bot activity and blocked by default. Customers with existing block or challenge rules are already protected. Cloudflare also added signature matches for this stealth crawler in their managed rules. These protections are available for all customers, including free users.

Cloudflare reported that since their Content Independence Day announcement, over 2.5 million websites have disallowed AI training through managed robots.txt or AI crawler blocking rules. This allows content owners to control which AI crawlers access their content.

Cloudflare is working with technical and policy experts worldwide, including IETF, to standardize robots.txt extensions. This aims to establish clear principles for ethical bot operation in this evolving space.

Key points:

  • Perplexity AI uses stealth tactics to hide crawling activity.
  • Cloudflare blocks this behavior via updated managed rules.
  • Over 2.5 million websites now control AI crawler access.