将这项工作的目录下不允许的网页,但仍允许在该目录网址的网页?
Allow: /special-offers/$
Disallow: /special-offers/
允许:
www.mysite.com/special-offers/
但块:
www.mysite.com/special-offers/page1
www.mysite.com/special-offers/page2.html
等等
将这项工作的目录下不允许的网页,但仍允许在该目录网址的网页?
Allow: /special-offers/$
Disallow: /special-offers/
允许:
www.mysite.com/special-offers/
但块:
www.mysite.com/special-offers/page1
www.mysite.com/special-offers/page2.html
等等
说完看着谷歌的自己的robots.txt文件 ,他们正在做的正是我所质疑。
在行136-137它们有:
Disallow: /places/
Allow: /places/$
因此,他们在地方阻止任何事情,但允许根的地方URL。 我的语法的唯一区别就是命令, Disallow
是第一。
According to the HTML 4.01 specification, Appendix B.4.1 the values allowed in Disallow
(no pun intended) are partial URIs (representing partial or full paths), only:
The "Disallow" field specifies a partial URI that is not to be visited. This can be a full path, or a partial path; any URI that starts with this value will not be retrieved. For example,
Disallow: /help disallows both /help.html and /help/index.html, whereas
Disallow: /help/ would disallow /help/index.html but allow /help.html.
I don't think anything has changed since then, since current HTML5 Specification Drafts don't mention robots.txt
at all.
However, in practice, many Robot Engines (such as Googlebot) are more flexible in what they accept. If you use, for instance:
Disallow: /*.gif$
then Googlebot will skip any file with the gif
extension. I think you could do something like this to disallow all files under a folder, but I'm not 100% sure (you could test them with Google Webmaster Tools):
Disallow: /special-offers/*.*$
Anyway, you shouldn't rely on this too much (since each search engine might behave differently), so if possible it would be preferrable to use meta tags or HTTP headers instead. For instance, you could configure your webserver to include this header in all responses that should not be indexed (or followed):
X-Robots-Tag: noindex, nofollow
Search for the best way of doing it in your particular webserver. Here's an example in Apache, combining mod_rewrite
with mod_headers
to conditionally set some headers depending on the URL pattern. Disclaimer: I haven't tested it myself, so I can't tell how well it works.
# all /special-offers/ sub-urls set env var ROBOTS=none
RewriteRule ^/special-offers/.+$ - [E=ROBOTS:none]
# if env var ROBOTS is set then create header X-Robots-Tag: $ENV{ROBOTS}
RequestHeader set X-Robots-Tag %{ROBOTS}e env=ROBOTS
(Note: none
is equivalent to noindex, nofollow
)