How to set up a robot.txt which only allows the de

Say I have a site on http://example.com. I would really like allowing bots to see the home page, but any other page need to blocked as it is pointless to spider. In other words

http://example.com & http://example.com/ should be allowed, but http://example.com/anything and http://example.com/someendpoint.aspx should be blocked.

Further it would be great if I can allow certain query strings to passthrough to the home page: http://example.com?okparam=true

but not http://example.com?anythingbutokparam=true

标签： web-crawler bots robots.txt googlebot slurp

5条回答

成全新的幸福

2楼-- · 2019-03-12 03:37

So after some research, here is what I found - a solution acceptable by the major search providers: google , yahoo & msn (I could on find a validator here) :

User-Agent: *
Disallow: /*
Allow: /?okparam=
Allow: /$

The trick is using the $ to mark the end of URL.

0人赞添加讨论(0) 举报

我欲成王，谁敢阻挡

3楼-- · 2019-03-12 03:46

Google's Webmaster Tools report that disallow always takes precedence over allow, so there's no easy way of doing this in a robots.txt file.

You could accomplish this by puting a noindex,nofollow META tag in the HTML every page but the home page.

0人赞添加讨论(0) 举报

看我几分像从前

4楼-- · 2019-03-12 03:49

Basic robots.txt:

Disallow: /subdir/

I don't think that you can create an expression saying 'everything but the root', you have to fill in all sub directories.

The query string limitation is also not possible from robots.txt. You have to do it in the background code (the processing part), or maybe with server rewrite-rules.

0人赞添加讨论(0) 举报

闹够了就滚

5楼-- · 2019-03-12 03:52

Disallow: *
Allow: index.ext

If I remember correctly the second clause should override the first.

0人赞添加讨论(0) 举报

萌系小妹纸

6楼-- · 2019-03-12 03:53

As far as I know, not all the crawlers support Allow tag. One possible solution might be putting everything except the home page into another folder and disallowing that folder.

0人赞添加讨论(0) 举报

How to set up a robot.txt which only allows the de

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间