Top AI crawlers, last 30 days

Hits = absolute count. Share = % of total crawler traffic (the 6.7% slice above), not % of all site traffic.

Bot Operator Hits Share of crawlers WoW robots.txt Note
GPTBot OpenAI 2,847 46.7% +18% GPTBot Training + ChatGPT browsing. Aggressive in May 2026.
ClaudeBot Anthropic 1,612 26.5% +22% ClaudeBot Training crawler. Polite, respects crawl-delay.
PerplexityBot Perplexity 802 13.2% +30% PerplexityBot Fastest-growing AI crawler in the current snapshot. Answer-grounding focus.
OAI-SearchBot OpenAI 412 6.8% +12% OAI-SearchBot ChatGPT Search indexing. Separate token from GPTBot.
CCBot Common Crawl 148 2.4% -5% CCBot Feeds many downstream LLMs. Monthly bulk crawl.
Claude-Web Anthropic 92 1.5% +8% Claude-Web Claude's user-initiated web fetch. Different from ClaudeBot.
Bytespider Bytedance 71 1.2% -20% Bytespider Powers Doubao + TikTok search. Heavy traffic on some sites.
Applebot-Extended Apple 48 0.8% +15% Applebot-Extended Apple Intelligence training opt-out signal.
FacebookBot Meta 32 0.5% +3% FacebookBot Meta AI. Lower volume than GPT-class.
cohere-ai Cohere 19 0.3% ±0% cohere-ai Cohere's training fetch. Niche but present.
DuckAssistBot DuckDuckGo 11 0.2% +25% DuckAssistBot DuckDuckGo AI Assist. Just appeared in our logs this snapshot.

What's interesting in this snapshot

OpenAI and Anthropic dominate. GPTBot, ClaudeBot, OAI-SearchBot, and Claude-Web together account for over 80% of AI crawler hits we see. PerplexityBot is the fastest-growing, up 30% week-over-week.

Bytespider down. ByteDance's crawler was aggressive through 2024-2025; the current snapshot has it cooled off, consistent with our reading of their robots.txt-respecting patterns improving.

DuckAssistBot just appeared. First showed up in our logs this snapshot. Small volume but rising 25% WoW.

Common Crawl flat. CCBot's volume is structurally bound to its monthly bulk-crawl cadence. Usually it's a single multi-day spike, not steady traffic.

Method

Hits classified via traffic_class_breakdown. Eight Cloudflare-compatible buckets including ai_crawler (this index), ai_user_action (Claude/ChatGPT fetching for a user, not shown here), and verified_search_bot (Google, Bing, also not shown).

User-agent matching uses our open pattern list plus FCrDNS verification for operators that publish verifiable IP ranges (Anthropic, OpenAI). UA spoofing is mitigated by reverse-DNS plus forward-DNS round-tripping. Bots claiming to be GPTBot but routing from cloud IPs without proper verification land in unverified_bot, not here.

What to do with this data

If you're a site operator:

  • Decide whether to Allow: or Disallow: each bot in your robots.txt. For an SEO/discoverability play, allow GPTBot, ClaudeBot, PerplexityBot, OAI-SearchBot. Those are the answer-grounding tokens. For an opt-out-of-training play, disallow them.
  • Notice if a crawler is over-aggressive (excessive hits, slow page-load impact). Use Cloudflare's bot management or your hosting platform's rate-limiting.
  • Cross-reference with our llms.txt explainer. Robots allow plus an llms.txt plus standard SEO is the complete AI-search stack.

Track your own crawler share

Sign up free at mcp-analytics.com, paste the tracking snippet on your site, and ask Claude:

"How much of mysite.com's traffic is ai_crawler?"
"Show me top user agents in the ai_crawler class last 30 days."
"WoW change in GPTBot hits."

Free tier: 100,000 hits/month, unlimited sites, no card. Your contributions feed the aggregate index once we reach launch threshold. No identifiable per-site data is published.