robots.txt snippets

Blocking

Block DOCX, PDF, PPTX files from being indexed using robots.txt

To block indexing of static files such as DOCX, PDF, PPTX, update the robots.txt file with:

User-agent: * 
Disallow: /*.pdf$
Disallow: /*.docx$
Disallow: /*.pptx$

Block indexing of all URLs except one

To instruct search engine bots to stop crawling and indexing all URLs under /wp-admin but allow the indexing of /wp-admin/admin-ajax.php file, use:

User-agent: *
Disallow: /wp-admin/ 
Allow: /wp-admin/admin-ajax.php 

Directives

Allow

You can specify the Allow directive in robots.txt:

User-agent: *
Allow:

Note: The Allow directive hasn't been part of the official specification of robots.txt. Most search engines bots will read it and follow it.

By default, all search engines will index everything. Creating a robots.txt file with the Allow directive to instruct bots to crawl a website is redundant.

The use of Allow directive is useful when you want to disallow indexing of certain folders or URLs, but allow others. For example:

User-agent: *
Disallow: /wp-admin/ 
Allow: /wp-admin/admin-ajax.php 

In this example, bots are instructed to not crawl and index anything under the /wp-admin/ folder but still crawl and index the /wp-admin/admin-ajax.php file.

Disallow

To instruct search engine bots to not index a specific URL, folder, use:

User-agent: *
Disallow: 

This example instructs bots to not crawl and index anything on your website, regardless of URL.

To disallow the indexing only of a certain URL, use the Disallow directive by specifying the URL:

User-agent: *
Disallow: /wp-admin/ 

Crawl-delay

@TODO

Host

@TODO

Sitemap

To inform bots the location of a sitemap XML file, use the Sitemap directive:

User-agent: *
Sitemap: https://www.mydomain.com/sitemap.xml