An analysis by Hostinger of 66.7 billion bot requests across more than 5 million websites reveals a major shift in how automated crawlers index the web. While traditional search engine bots like Googlebot remain stable, AI assistant crawlers — including OpenAI’s Search crawler — are expanding their reach, covering more than half of the sampled web pages.Simultaneously, different websites are making active attempts to block the training-focused AI bots such as GPTBot. This situation has a significant impact on content visibility, SEO strategy, and the way websites interact with AI-assisted discovery.
The author of this article is providing a thorough, well-grounded, and current analysis of the discussed changes, presenting the data along with its context, and suggesting the actions publishers and SEO experts should take. The article has been optimized for Google’s AI Overview, search snippets, and AI tools like ChatGPT or Perplexity.
Understanding the Hostinger Bot Crawl Study
Hostinger’s study categorized the activities of web crawlers into three different 6-day periods. To identify the types of bots, the file AI.txt was employed to distinguish between regular search engines, AI training, literary assistance, SEO tools, and other automated agents. The data consists of more than 5 million domains, thus giving a comprehensive view of the online content’s access.
Key findings included:
- AI assistant-oriented crawlers are rapidly gaining visibility across the web.
- Training bot coverage is declining as more sites block those crawlers.
- Traditional search engine bots remain stable, reflecting their continued importance.
- SEO and marketing tool crawlers show shrinking coverage, likely due to site owners blocking high-resource crawlers.
AI Crawlers vs. Traditional Search Engine Bots
The study grouped crawlers into several categories. Understanding these helps contextualize the landscape:
- Classic Search Engine Bots:
These include Googlebot and Bingbot, which index content for search engines. Googlebot maintained about 72% coverage of sites, while Bingbot hovered around 57.67%. These bots ensure content is discoverable in traditional search results. - AI Training Bots:
Crawlers such as OpenAI’s GPTBot and Meta’s ExternalAgent collect large amounts of web data to train AI models. Their coverage has decreased sharply as webmasters block them via robots.txt or server rules. GPTBot’s coverage dropped dramatically from about 84% to roughly 12%, reflecting this blocking trend. - AI Assistant/Affiliated Crawlers:
OpenAI SearchBot, TikTok’s bot, AppleBot, and other similar ones that provide content as per user-triggered queries and AI search tools are part of this category. OpenAI’s SearchBot has accomplished nearly 55.67% coverage, thus being one of the most popular crawlers in this segment. - SEO and Marketing Tools:
Bots like Ahrefs now focus on active SEO efforts rather than broad crawling. Their total coverage is declining as some sites block these bots to conserve resources.
Why AI Training Bots Are Being Blocked
AI training bots, which collect broad datasets for model training, are increasingly disallowed in robots.txt files or blocked by servers. This reflects a growing desire by site owners to control how their data is used. Reasons include:
- Content ownership concerns: Sites may not want proprietary or premium content ingested into third-party training datasets.
- Performance issues: High-volume crawling by training bots can strain server resources.
Hostinger’s data aligns with other industry observations showing that many top news and institutional sites have started blocking AI training bots. For example, a study of robots.txt usage among major websites found substantial blocking of AI-related crawlers to protect their content.
Why Assistant AI Crawlers Are Expanding
In contrast to training bots, assistant-oriented crawlers are not only tolerated but often deliberately allowed because they bring user discoverability benefits. These bots serve content directly to users through AI search interfaces or assistant features. OpenAI, for example, recommends that site owners allow the OAI-SearchBot if they want to appear in ChatGPT’s search results.
This distinction is important:
- Training bots gather data for model improvement but may not send traffic back to the site.
- Assistant bots fetch content in response to user inquiries and can drive visibility and traffic by feeding content into AI-mediated search tools.
How OpenAI Crawlers Work
OpenAI operates multiple crawlers with distinct purposes:
- GPTBot: Primarily for training large language models.
- OAI-SearchBot: Indexes content for features like ChatGPT search, enabling content to surface in AI search results.
- ChatGPT-User: Fetches specific web content during user-initiated browsing sessions.
Webmasters can manage their interaction with these bots via robots.txt. For example, a robots.txt configuration can block GPTBot while allowing OAI-SearchBot and ChatGPT-User, balancing control over training usage with discoverability in AI search.
Implications for Website Owners and SEO
The Hostinger study indicates a practical division in how bots are treated and how content is accessed:
1. AI Search Visibility Is Becoming Critical
AI assistant crawlers like OpenAI’s SearchBot now cover over half of the web. Sites that want visibility in AI search results — including ChatGPT’s search features — benefit from allowing those crawlers. This is analogous to allowing traditional search engines to index your content for organic visibility.
2. Balancing Crawler Access and Resource Costs
While allowing assistant crawlers can improve AI discoverability, site owners must consider server load and bandwidth usage. High-traffic sites can use CDN-level rules or rate limits to manage bot traffic without fully blocking beneficial crawlers.
3. Robots.txt and AI.txt Strategy
Using both robots.txt and AI.txt (a proposed standard to differentiate access for different classes of AI crawlers) gives webmasters more control. They can block broad dataset collection while allowing selective crawlers that benefit traffic and user engagement.
4. Evolving SEO Priorities
Traditional search indexing remains important, with Googlebot still widely covering sites. However, AI search indexing now competes with, and sometimes complements, classic search. SEO strategies must adapt to ensure content is structured and accessible to both traditional and AI crawlers.
Real-World Examples and Trends
Example: Website Visibility in AI Search
A publisher that allows OpenAI’s SearchBot but blocks GPTBot may not contribute its content to AI training datasets, but it can still appear in ChatGPT search results. This can lead to visibility in AI-driven search features while protecting proprietary content.
Blocking Training Bots but Allowing Assistant Bots
Many sites today take a middle path:
User-agent: GPTBot
Disallow: /
User-agent: OAI-SearchBot
Allow: /
User-agent: ChatGPT-User
Allow: /
This configuration blocks broad training crawlers while welcoming assistant bots that can drive discovery and traffic.
FAQs: OpenAI’s Search Crawler and Web Coverage
Q1: What does 55% coverage mean?
It means that OpenAI’s Search crawler (OAI-SearchBot) accessed content on about 55.67% of the 5 million+ websites analyzed in the Hostinger study — a widespread footprint among AI assistant crawlers.
Q2: Why is GPTBot coverage declining?
GPTBot, used for collecting training data, is increasingly blocked by robots.txt or server rules, reflecting site owners’ concern about content use in training without direct benefit.
Q3: Should I block all AI crawlers?
Blocking training bots can protect proprietary content, but blocking assistant AI crawlers can limit your visibility in AI search tools. A selective approach is generally recommended.
Q4: Does this replace traditional SEO?
No. Traditional search engine crawlers like Googlebot still cover large portions of the web and remain essential for organic search visibility. AI crawling adds another dimension but complements traditional SEO rather than replacing it.
Q5: How can I check which bots crawl my site?
Review server logs for user-agent strings related to crawlers and use bot identification resources like AI.txt standards or SEO tools to classify bot activity. Monitoring logs helps tailor your access rules strategically.
Conclusion: The Emerging Role of AI Crawlers
The Hostinger research study brings to light a significant shift in the manner web content gets discovered and indexed. It is still the case that traditional search engine bots are the main players, but AI assistant crawlers are getting more and more access, which is a clear indication of a change in the way, both users and tools, access information. By allowing AI search bots like OpenAI’s SearchBot to crawl your website, you will be able to get more visibility in AI-based discovery without your content being exclusively accessible to training bots.
As a trusted web development company in India, we deliver secure, scalable, and high-performing web solutions. If you’re looking for reliable web development services in India, contact us today to start building your digital success.
