According to this page
globbing and regular expression are not supported in either the User-agent or Disallow lines
However, I noticed that the stackoverflow robots.txt includes characters like * and ? in the URLs. Are these supported or not?
Also, does it make any difference whether a URL includes a trailing slash, or are these two equivalent?
Disallow: /privacy
Disallow: /privacy/
Your second question, the two are not equivalent. /privacy
will block anything that starts with /privacy
, including something like /privacy_xyzzy
. /privacy/
, on the other hand, would not block that.
The original robots.txt did not support globbing or wildcards. However, many robots do. Google, Microsoft, and Yahoo agreed on a standard a few years back. See http://googlewebmastercentral.blogspot.com/2008/06/improving-on-robots-exclusion-protocol.html for details.
Most major robots that I know of support that "standard."