When using a Robots.txt file, does the user agent string have to be exactly as it appears in my server logs?
For example when trying to match GoogleBot, can I just use googlebot
?
Also, will a partial-match work? For example just using Google
?
When using a Robots.txt file, does the user agent string have to be exactly as it appears in my server logs?
For example when trying to match GoogleBot, can I just use googlebot
?
Also, will a partial-match work? For example just using Google
?
(As already answered in another question)
In the original robots.txt specification (from 1994), it says:
But if/which parsers work like that is another question. Your best bet would be to look for the documentation of the bots you want to add. You’ll typically find the agent identifier string in it, e.g.:
Bing:
DuckDuckGo:
Google:
Internet Archive:
…
robots.txt is case-sensitive, although Google is more conservative than other bots, and may accept its string either way, other bots may not.
Yes, the user agent has to be an exact match.
From robotstxt.org: "globbing and regular expression are not supported in either the User-agent or Disallow lines"
At least for googlebot, the user-agent is non-case-sensitive. Read the 'Order of precedence for user-agents' section:
https://code.google.com/intl/de/web/controlcrawlindex/docs/robots_txt.html
In theory, yes. However, in practise it seems to be specific partial-matches or "substrings" (as mentioned in @unor's answer) that match. These specific "substrings" appear to be referred to as "tokens". And often it must be an exact match for these "tokens".
With regards to the standard Googlebot, this only appears to match
Googlebot
(case-insensitive). Any lesser partial-match, such asGoogle
, fails to match. Any longer partial-match, such asGooglebot/1.2
, fails to match. And using the full user-agent string (Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html
) also fails to match. (Although there is technically more than one user-agent for the Googlebot anyway, so matching on the full user-agent string would not be recommended anyway - even if it did work.)These tests were performed with Google's robots.txt tester.
Reference:
robots.txt
)