Robots.txt: Is this wildcard rule valid?

2019-01-09 13:33发布

Simple question. I want to add:

Disallow */*details-print/

Basically, blocking rules in the form of /foo/bar/dynamic-details-print --- foo and bar in this example can also be totally dynamic.

I thought this would be simple, but then on www.robotstxt.org there is this message:

Note also that globbing and regular expression are not supported in either the User-agent or Disallow lines. The '*' in the User-agent field is a special value meaning "any robot". Specifically, you cannot have lines like "User-agent: bot", "Disallow: /tmp/*" or "Disallow: *.gif".

So we can't do that? Do search engines abide by it? But then, there's Quora.com's robots.txt file:

Disallow: /ajax/
Disallow: /*/log
Disallow: /*/rss
Disallow: /*_POST

So, who is right -- Or am I misunderstanding the text on robotstxt.org?

Thanks!

1条回答
Fickle 薄情
2楼-- · 2019-01-09 14:01

The answer is, "it depends". The robots.txt "standard" as defined at robotstxt.org is the minimum that bots are expected to support. Googlebot, MSNbot, and Yahoo Slurp support some common extensions, and there's really no telling what other bots support. Some say what they support and others don't.

In general, you can expect the major search engine bots to support the wildcards that you've written, and the one you have there looks like it will work. Best bet would be to run it past one or more of these robots.txt validators or use Google's Webmaster tools to check it.

查看更多
登录 后发表回答