Will this work for disallowing pages under a directory, but still allow a page on that directory url?
Allow: /special-offers/$
Disallow: /special-offers/
to allow:
www.mysite.com/special-offers/
but block:
www.mysite.com/special-offers/page1
www.mysite.com/special-offers/page2.html
etc
Standards
According to the HTML 4.01 specification, Appendix B.4.1 the values allowed in
Disallow
(no pun intended) are partial URIs (representing partial or full paths), only:I don't think anything has changed since then, since current HTML5 Specification Drafts don't mention
robots.txt
at all.Extensions
However, in practice, many Robot Engines (such as Googlebot) are more flexible in what they accept. If you use, for instance:
then Googlebot will skip any file with the
gif
extension. I think you could do something like this to disallow all files under a folder, but I'm not 100% sure (you could test them with Google Webmaster Tools):Other options
Anyway, you shouldn't rely on this too much (since each search engine might behave differently), so if possible it would be preferrable to use meta tags or HTTP headers instead. For instance, you could configure your webserver to include this header in all responses that should not be indexed (or followed):
Search for the best way of doing it in your particular webserver. Here's an example in Apache, combining
mod_rewrite
withmod_headers
to conditionally set some headers depending on the URL pattern. Disclaimer: I haven't tested it myself, so I can't tell how well it works.(Note:
none
is equivalent tonoindex, nofollow
)Having looked at Google's very own robots.txt file, they are doing exactly what I was questioning.
At line 136-137 they have:
So they are blocking any thing under places, but allowing the root places URL. The only difference with my syntax is the order, the
Disallow
being first.