robots.txt snippets

Copy-paste-ready robots.txt snippets for blocking directories, allowing exceptions, targeting specific bots, declaring sitemaps, and managing AI crawlers.

Allow All Bots

Allow All Bots to Crawl Everything with robots.txt

robots.txt with an empty Disallow value places no restrictions on any bot. All search engine bots and AI crawlers may crawl every URL on the site.

User-agent: *
Disallow:

Allow All Bots and Declare a Sitemap in robots.txt

robots.txt can declare the XML sitemap location using the Sitemap directive. The Sitemap line is independent of the User-agent group and applies to all compliant bots.

User-agent: *
Disallow:

Sitemap: https://example.com/sitemap.xml

Allow All Bots and Declare Multiple Sitemaps in robots.txt

robots.txt supports multiple Sitemap directives. List separate sitemaps for different content types such as pages, news articles, images, or videos.

User-agent: *
Disallow:

Sitemap: https://example.com/sitemap.xml
Sitemap: https://example.com/sitemap-news.xml
Sitemap: https://example.com/sitemap-images.xml

Block All Bots

Block All Bots from Crawling the Entire Site with robots.txt

robots.txt with Disallow: / blocks all bots from crawling any URL on the site. The / path matches every URL because robots.txt uses prefix matching.

User-agent: *
Disallow: /

Block a Specific Bot from Crawling the Entire Site with robots.txt

robots.txt targets a specific bot by name using the User-agent directive. This example blocks only Googlebot while allowing all other bots to crawl the site.

User-agent: Googlebot
Disallow: /

User-agent: *
Disallow:

Block Directories

Block a Single Directory with robots.txt

robots.txt blocks all URLs under a directory path using the Disallow directive. The trailing slash ensures only URLs within the directory are matched.

User-agent: *
Disallow: /admin/

Block Multiple Directories with robots.txt

robots.txt accepts multiple Disallow lines within a single User-agent group. Each Disallow line blocks one path prefix.

User-agent: *
Disallow: /admin/
Disallow: /staging/
Disallow: /tmp/

Block a Directory but Allow a Specific File with robots.txt

robots.txt combines Disallow and Allow to create exceptions within a blocked directory. Googlebot and Bingbot follow the longest (most specific) matching rule.

User-agent: *
Disallow: /wp-admin/
Allow: /wp-admin/admin-ajax.php

Block File Types

Block PDF Files from Being Crawled with robots.txt

robots.txt uses the $ end-of-URL anchor to match files by extension. This rule blocks all URLs ending in .pdf across the entire site.

User-agent: *
Disallow: /*.pdf$

Block Multiple File Types with robots.txt

robots.txt blocks multiple file extensions by listing separate Disallow rules with wildcard and end-of-URL matching. This example blocks PDF, DOCX, and PPTX files.

User-agent: *
Disallow: /*.pdf$
Disallow: /*.docx$
Disallow: /*.pptx$

Block Image Crawling with robots.txt

robots.txt blocks crawling of all files under an images directory. This prevents search engine image crawlers from indexing images hosted in that path.

User-agent: *
Disallow: /images/

AI Bot Control

Block All AI Training Crawlers with robots.txt

robots.txt blocks AI training crawlers by targeting each bot's User-agent individually. This configuration blocks GPTBot (OpenAI), ClaudeBot (Anthropic), Google-Extended (Google AI), Meta-ExternalAgent (Meta), and PerplexityBot (Perplexity) while allowing all search engine bots to crawl normally.

User-agent: GPTBot
Disallow: /

User-agent: ClaudeBot
Disallow: /

User-agent: Google-Extended
Disallow: /

User-agent: Meta-ExternalAgent
Disallow: /

User-agent: PerplexityBot
Disallow: /

User-agent: *
Disallow:

Sitemap: https://example.com/sitemap.xml

Block OpenAI Bots but Allow Search Engines with robots.txt

robots.txt blocks both OpenAI crawlers ( GPTBot for training, OAI-SearchBot for search indexing) while keeping the site accessible to Googlebot and Bingbot.

User-agent: GPTBot
Disallow: /

User-agent: OAI-SearchBot
Disallow: /

User-agent: *
Disallow:

Block Google AI Training but Keep Google Search Indexing with robots.txt

robots.txt blocks Google-Extended to prevent Google from using site content for Gemini AI training. Blocking Google-Extended does not affect Googlebot search crawling or indexing.

User-agent: Google-Extended
Disallow: /

User-agent: Googlebot
Disallow:

Crawl Rate Control

Set Crawl Delay for Bingbot with robots.txt

robots.txt uses Crawl-delay to set the minimum seconds between requests from Bingbot. Google does not support Crawl-delay; control Googlebot's crawl rate through Google Search Console.

User-agent: bingbot
Crawl-delay: 10
Disallow: /staging/

User-agent: *
Disallow: /staging/

Set Crawl Delay for Yandex with robots.txt

robots.txt sets a Crawl-delay for YandexBot independently from other bots. Yandex interprets the value as the number of seconds between consecutive requests.

User-agent: YandexBot
Crawl-delay: 5
Disallow:

Sitemap Directive

Declare a Sitemap in robots.txt

robots.txt uses the Sitemap directive to tell search engine bots the location of the XML sitemap. The directive requires the full URL including the protocol. Place the Sitemap line outside any User-agent group because it applies globally.

Sitemap: https://example.com/sitemap.xml

Declare a Sitemap on a Different Subdomain in robots.txt

robots.txt accepts sitemap URLs pointing to any accessible location, including a different subdomain. The sitemap URL must be a fully qualified absolute URL.

User-agent: *
Disallow:

Sitemap: https://cdn.example.com/sitemap.xml