robots.txt best practices
Heads-up: Compliance with robots.txt is optional
An important heads-up: Bots compliance with the rules and directive in a
robots.txt is optional.
Good bots are most likely to follow the instructions set in the robots.txt file, while the bad bots would likely ignore the file.
Mention user-agent one by one
Group directives based on user-agents one by one:
User-agent: Googlebot Disallow: /*.pdf$ User-agent: Googlebot-Image Disallow: /*.pdf$
Use robots.txt for each origin (domains, subdomains)
A robots.txt file works only for one origin. Websites with multiple subdomains should use separated robots.txt for each subdomain.
The rules in robots.txt for
subdomain1.domain.com (hosted at
subdomain1.domain.com/robots.txt) apply only for
subdomain1.domain.com, it doesn't apply for
For example, a website with the main
www origin and 2 subdomains:
subdomain2. Each of these 3 origins must have their own robots.txt files:
Use noindex to block indexing instead of a robots.txt
noindex to block the indexing of certain URLs instead on relying on the robots.txt file.
For example, Google may still index a certain URL if that URL is pointed to from other pages of your website:
Warning: Don't use a robots.txt file as a means to hide your web pages from Google search results. If other pages point to your page with descriptive text, Google could still index the URL without visiting the page. If you want to block your page from search results, use another method such as password protection or noindex. https://developers.google.com/search/docs/advanced/robots/intro
You can use the
noindex meta tag or the
Conflicting rules when using Allow and Disallow directives
Pay attention whenever you use the
Disallow directives at the same time:
User-agent: * Allow: /articles Disallow: /articles/
For Google and Bing search engine bots, the directive with the most characters will be prioritized and followed. In this example, that would be the
Disallow directive because it has more characters.
Other search engines may interpret conflicting rules differently. If these bots follow only the first matching directive, that would be the
Allow directive from this example.
Use UTF-8 format
To ensure that most search engine bots can read and interpret the directives in the robots.txt file, follow these instructions:
- robots.txt is saved using the UTF-8 format as a plain text file (.txt)
- lines are separated by
Limit size to be 500 KB
It's recommended to limit the size of the robots.txt file up to 500 KB.