robots.txt snippets
On this page
Blocking
Block DOCX, PDF, PPTX files from being indexed using robots.txt
To block indexing of static files such as DOCX, PDF, PPTX, update the
robots.txt
file with:
User-agent: *
Disallow: /*.pdf$
Disallow: /*.docx$
Disallow: /*.pptx$
Block indexing of all URLs except one
To instruct search engine bots to stop crawling and indexing all URLs under
/wp-admin
but allow the indexing of
/wp-admin/admin-ajax.php
file, use:
User-agent: *
Disallow: /wp-admin/
Allow: /wp-admin/admin-ajax.php
Directives
Allow
You can specify the
Allow
directive in
robots.txt
:
User-agent: *
Allow:
Note: The Allow directive hasn't been part of the official specification of robots.txt. Most search engines bots will read it and follow it.
By default, all search engines will index everything. Creating a robots.txt file with the
Allow
directive to instruct bots to crawl a website is redundant.
The use of
Allow
directive is useful when you want to disallow indexing of certain folders or URLs, but allow others. For example:
User-agent: *
Disallow: /wp-admin/
Allow: /wp-admin/admin-ajax.php
In this example, bots are instructed to not crawl and index anything under the
/wp-admin/
folder but still crawl and index the
/wp-admin/admin-ajax.php
file.
Disallow
To instruct search engine bots to not index a specific URL, folder, use:
User-agent: *
Disallow:
This example instructs bots to not crawl and index anything on your website, regardless of URL.
To disallow the indexing only of a certain URL, use the
Disallow
directive by specifying the URL:
User-agent: *
Disallow: /wp-admin/
Crawl-delay
@TODO
Host
@TODO
Sitemap
To inform bots the location of a sitemap XML file, use the
Sitemap
directive:
User-agent: *
Sitemap: https://www.mydomain.com/sitemap.xml