Robots.txt: Is this wildcard rule valid?

2019-01-09 13:20发布

问题:

Simple question. I want to add:

Disallow */*details-print/

Basically, blocking rules in the form of /foo/bar/dynamic-details-print --- foo and bar in this example can also be totally dynamic.

I thought this would be simple, but then on www.robotstxt.org there is this message:

Note also that globbing and regular expression are not supported in either the User-agent or Disallow lines. The '*' in the User-agent field is a special value meaning "any robot". Specifically, you cannot have lines like "User-agent: bot", "Disallow: /tmp/*" or "Disallow: *.gif".

So we can't do that? Do search engines abide by it? But then, there's Quora.com's robots.txt file:

Disallow: /ajax/
Disallow: /*/log
Disallow: /*/rss
Disallow: /*_POST

So, who is right -- Or am I misunderstanding the text on robotstxt.org?

Thanks!

回答1:

The answer is, "it depends". The robots.txt "standard" as defined at robotstxt.org is the minimum that bots are expected to support. Googlebot, MSNbot, and Yahoo Slurp support some common extensions, and there's really no telling what other bots support. Some say what they support and others don't.

In general, you can expect the major search engine bots to support the wildcards that you've written, and the one you have there looks like it will work. Best bet would be to run it past one or more of these robots.txt validators or use Google's Webmaster tools to check it.