robots.txt snippets
Copy-paste-ready robots.txt snippets for blocking directories, allowing exceptions, targeting specific bots, declaring sitemaps, and managing AI crawlers.
- Allow All Bots
- Allow All Bots to Crawl Everything with robots.txt
- Allow All Bots and Declare a Sitemap in robots.txt
- Allow All Bots and Declare Multiple Sitemaps in robots.txt
- Block All Bots
- Block All Bots from Crawling the Entire Site with robots.txt
- Block a Specific Bot from Crawling the Entire Site with robots.txt
- Block Directories
- Block a Single Directory with robots.txt
- Block Multiple Directories with robots.txt
- Block a Directory but Allow a Specific File with robots.txt
- Block File Types
- Block PDF Files from Being Crawled with robots.txt
- Block Multiple File Types with robots.txt
- Block Image Crawling with robots.txt
- AI Bot Control
- Block All AI Training Crawlers with robots.txt
- Block OpenAI Bots but Allow Search Engines with robots.txt
- Block Google AI Training but Keep Google Search Indexing with robots.txt
- Crawl Rate Control
- Set Crawl Delay for Bingbot with robots.txt
- Set Crawl Delay for Yandex with robots.txt
- Sitemap Directive
- Declare a Sitemap in robots.txt
- Declare a Sitemap on a Different Subdomain in robots.txt
Allow All Bots
Allow All Bots to Crawl Everything with robots.txt
robots.txt with an empty
Disallow value places no restrictions on any bot. All search engine bots and AI crawlers may crawl every URL on the site.
User-agent: *
Disallow:Allow All Bots and Declare a Sitemap in robots.txt
robots.txt can declare the XML sitemap location using the
Sitemap directive. The
Sitemap line is independent of the
User-agent group and applies to all compliant bots.
User-agent: *
Disallow:
Sitemap: https://example.com/sitemap.xmlAllow All Bots and Declare Multiple Sitemaps in robots.txt
robots.txt supports multiple
Sitemap directives. List separate sitemaps for different content types such as pages, news articles, images, or videos.
User-agent: *
Disallow:
Sitemap: https://example.com/sitemap.xml
Sitemap: https://example.com/sitemap-news.xml
Sitemap: https://example.com/sitemap-images.xmlBlock All Bots
Block All Bots from Crawling the Entire Site with robots.txt
robots.txt with
Disallow: / blocks all bots from crawling any URL on the site. The
/ path matches every URL because robots.txt uses prefix matching.
User-agent: *
Disallow: /Block a Specific Bot from Crawling the Entire Site with robots.txt
robots.txt targets a specific bot by name using the
User-agent directive. This example blocks only Googlebot while allowing all other bots to crawl the site.
User-agent: Googlebot
Disallow: /
User-agent: *
Disallow:Block Directories
Block a Single Directory with robots.txt
robots.txt blocks all URLs under a directory path using the
Disallow directive. The trailing slash ensures only URLs within the directory are matched.
User-agent: *
Disallow: /admin/Block Multiple Directories with robots.txt
robots.txt accepts multiple
Disallow lines within a single
User-agent group. Each
Disallow line blocks one path prefix.
User-agent: *
Disallow: /admin/
Disallow: /staging/
Disallow: /tmp/Block a Directory but Allow a Specific File with robots.txt
robots.txt combines
Disallow and
Allow to create exceptions within a blocked directory. Googlebot and Bingbot follow the longest (most specific) matching rule.
User-agent: *
Disallow: /wp-admin/
Allow: /wp-admin/admin-ajax.phpBlock File Types
Block PDF Files from Being Crawled with robots.txt
robots.txt uses the
$ end-of-URL anchor to match files by extension. This rule blocks all URLs ending in
.pdf across the entire site.
User-agent: *
Disallow: /*.pdf$Block Multiple File Types with robots.txt
robots.txt blocks multiple file extensions by listing separate
Disallow rules with wildcard and end-of-URL matching. This example blocks PDF, DOCX, and PPTX files.
User-agent: *
Disallow: /*.pdf$
Disallow: /*.docx$
Disallow: /*.pptx$Block Image Crawling with robots.txt
robots.txt blocks crawling of all files under an images directory. This prevents search engine image crawlers from indexing images hosted in that path.
User-agent: *
Disallow: /images/AI Bot Control
Block All AI Training Crawlers with robots.txt
robots.txt blocks AI training crawlers by targeting each bot's User-agent individually. This configuration blocks GPTBot (OpenAI), ClaudeBot (Anthropic), Google-Extended (Google AI), Meta-ExternalAgent (Meta), and PerplexityBot (Perplexity) while allowing all search engine bots to crawl normally.
User-agent: GPTBot
Disallow: /
User-agent: ClaudeBot
Disallow: /
User-agent: Google-Extended
Disallow: /
User-agent: Meta-ExternalAgent
Disallow: /
User-agent: PerplexityBot
Disallow: /
User-agent: *
Disallow:
Sitemap: https://example.com/sitemap.xmlBlock OpenAI Bots but Allow Search Engines with robots.txt
robots.txt blocks both OpenAI crawlers (
GPTBot for training,
OAI-SearchBot for search indexing) while keeping the site accessible to Googlebot and Bingbot.
User-agent: GPTBot
Disallow: /
User-agent: OAI-SearchBot
Disallow: /
User-agent: *
Disallow:Block Google AI Training but Keep Google Search Indexing with robots.txt
robots.txt blocks
Google-Extended to prevent Google from using site content for Gemini AI training. Blocking
Google-Extended does not affect
Googlebot search crawling or indexing.
User-agent: Google-Extended
Disallow: /
User-agent: Googlebot
Disallow:Crawl Rate Control
Set Crawl Delay for Bingbot with robots.txt
robots.txt uses
Crawl-delay to set the minimum seconds between requests from Bingbot. Google does not support
Crawl-delay; control Googlebot's crawl rate through Google Search Console.
User-agent: bingbot
Crawl-delay: 10
Disallow: /staging/
User-agent: *
Disallow: /staging/Set Crawl Delay for Yandex with robots.txt
robots.txt sets a
Crawl-delay for YandexBot independently from other bots. Yandex interprets the value as the number of seconds between consecutive requests.
User-agent: YandexBot
Crawl-delay: 5
Disallow:Sitemap Directive
Declare a Sitemap in robots.txt
robots.txt uses the
Sitemap directive to tell search engine bots the location of the XML sitemap. The directive requires the full URL including the protocol. Place the
Sitemap line outside any
User-agent group because it applies globally.
Sitemap: https://example.com/sitemap.xmlDeclare a Sitemap on a Different Subdomain in robots.txt
robots.txt accepts sitemap URLs pointing to any accessible location, including a different subdomain. The sitemap URL must be a fully qualified absolute URL.
User-agent: *
Disallow:
Sitemap: https://cdn.example.com/sitemap.xml