SEO Bots
Manage how search engine bots and AI crawlers discover, crawl, and index your website using robots.txt, meta robots tags, and X-Robots-Tag headers.
SEO bots (search engine bots) are automated programs that crawl websites to discover pages, follow links, and index content for search engines and AI systems.
How to Control Search Engine Bot Behavior
Three tools control how search engine bots interact with a website. Each tool operates at a different level and solves a different problem.
robots.txt (Robots Exclusion Protocol)controls crawling at the site level. The robots.txt file tells bots which URL paths they may or may not access before they request any page. Use robots.txt to manage crawl budget, block entire directories, or prevent bots from accessing resource-heavy paths. See the robots.txt articlefor syntax, best practices, and code snippets.
Meta robots tagscontrol indexing at the page level. A
<meta name="robots"> tag in the HTML
<head> tells search engine bots whether to index the page and whether to follow its links. Common values include
noindex (exclude from search results) and
nofollow (do not follow links on the page). Meta robots tags require the bot to crawl the page first, so they cannot work if robots.txt blocks access to that page.
X-Robots-Tag HTTP headerscontrol indexing at the server level. The
X-Robots-Tag response header provides the same directives as the meta robots tag but applies to any file type, including PDFs, images, and other non-HTML resources. Set the X-Robots-Tag in the web server configuration (Nginx, Apache HTTP Server) or application response headers.
When to Use Each Tool
| Tool | Scope | Controls | Use Case |
|---|---|---|---|
| robots.txt | Site-wide | Crawling (access to URLs) | Block directories, manage crawl budget, declare sitemap location |
| Meta robots tag | Per page | Indexing and link following | Prevent a specific HTML page from appearing in search results |
| X-Robots-Tag header | Per response | Indexing and link following | Prevent non-HTML files (PDF, images) from appearing in search results |
robots.txt prevents crawling. Meta robots and X-Robots-Tag prevent indexing. These are distinct actions. A page blocked by robots.txt may still appear in search results if other pages link to it. A page with a
noindex meta tag must be crawlable for the bot to read the tag and remove the page from the index.