Is there any advantage of using X-Robot-Tag instea

2019-01-28 15:44发布

问题:

It looks like there are two mainstream solutions for instructing crawlers what to index and what not to index: adding an X-Robot-Tag HTTP header, or indicating a robots.txt.

Is there any advantage to using the former?

回答1:

With robots.txt you cannot disallow indexing of your documents.

They have different purposes:

  • robots.txt can disallow crawling (with Disallow)
  • X-Robots-Tag ¹ can disallow indexing (with noindex)

(And both offer additional different features, e.g., linking to your Sitemap in robots.txt, disallowing following links in X-Robots-Tag, and many more.)

Crawling means accessing the document. Indexing means providing a link to (and possibly metadata from or about) the document in an index. In the typical case, a bot indexes a document after having crawled it, but that’s not necessary.

A bot that isn’t allowed to crawl a document may still index it (without ever accessing it). A bot that isn’t allowed to index a document may still crawl it. You can’t disallow both.

¹ Note that the header is called X-Robots-Tag, not X-Robot-Tag. By the way, the metadata name robots (for the HTML meta element) is an alternative to the HTTP header.