Blocking folders inbetween allowed content

2019-08-09 08:19发布

问题:

I have a site with the following structure:

http://www.example.com/folder1/folder2/folder3

I would like to disallow indexing in folder1, and folder2. But I would like the robots to index everything under folder3.

Is there a way to do this with the robots.txt?

For what I read I think that everything inside a specified folder is disallowed.

Would the following achieve my goal?

user-agent: *
Crawl-delay: 0

Sitemap: <Sitemap url>

Allow: /folder1/folder2/folder3
Disallow: /folder1/folder2/
Disallow: /folder1/
Allow: /

回答1:

Yes, it works... however google has a tool to test your robots.txt file

you only need to go on google webmaster tools (https://www.google.com/webmasters/tools/)

and open the section "site configuration -> crawler access"



回答2:

All you would need is:

user-agent: *
Crawl-delay: 0

Sitemap: 

Allow: /folder1/folder2/folder3
Disallow: /folder1/
Allow: /

At least googlebot will see the more specific allowing of that one directory and disallow anything from folder1 and on. This is backed up by this post by a Google employee.



回答3:

Line breaks in records are not allowed, so your original robots.txt should look like this:

user-agent: *
Crawl-delay: 0
Sitemap: <Sitemap url>
Allow: /folder1/folder2/folder3
Disallow: /folder1/folder2/
Disallow: /folder1/
Allow: /

Possible improvements:

  • Specifying Allow: / is superfluous, as it’s the default anyway.

  • Specifying Disallow: /folder1/folder2/ is superfluous, as Disallow: /folder1/ is sufficient.

  • As Sitemap is not per record, but for all bots, you could specify it as a separate block.

So your robots.txt could look like this:

User-agent: *
Crawl-delay: 0
Allow: /folder1/folder2/folder3
Disallow: /folder1/

Sitemap: http://example.com/sitemap

(Note that the Allow field is not part of the original robots.txt specification, so don’t expect all bots to understand it.)