SEO Bots

Manage how search engine bots and AI crawlers discover, crawl, and index your website using robots.txt, meta robots tags, and X-Robots-Tag headers.

SEO bots (search engine bots) are automated programs that crawl websites to discover pages, follow links, and index content for search engines and AI systems.

How to Control Search Engine Bot Behavior

Three tools control how search engine bots interact with a website. Each tool operates at a different level and solves a different problem.

robots.txt (Robots Exclusion Protocol)controls crawling at the site level. The robots.txt file tells bots which URL paths they may or may not access before they request any page. Use robots.txt to manage crawl budget, block entire directories, or prevent bots from accessing resource-heavy paths. See the robots.txt articlefor syntax, best practices, and code snippets.

Meta robots tagscontrol indexing at the page level. A <meta name="robots"> tag in the HTML <head> tells search engine bots whether to index the page and whether to follow its links. Common values include noindex (exclude from search results) and nofollow (do not follow links on the page). Meta robots tags require the bot to crawl the page first, so they cannot work if robots.txt blocks access to that page.

X-Robots-Tag HTTP headerscontrol indexing at the server level. The X-Robots-Tag response header provides the same directives as the meta robots tag but applies to any file type, including PDFs, images, and other non-HTML resources. Set the X-Robots-Tag in the web server configuration (Nginx, Apache HTTP Server) or application response headers.

When to Use Each Tool

ToolScopeControlsUse Case
robots.txtSite-wideCrawling (access to URLs)Block directories, manage crawl budget, declare sitemap location
Meta robots tagPer pageIndexing and link followingPrevent a specific HTML page from appearing in search results
X-Robots-Tag headerPer responseIndexing and link followingPrevent non-HTML files (PDF, images) from appearing in search results

robots.txt prevents crawling. Meta robots and X-Robots-Tag prevent indexing. These are distinct actions. A page blocked by robots.txt may still appear in search results if other pages link to it. A page with a noindex meta tag must be crawlable for the bot to read the tag and remove the page from the index.