Robots.txt: allow only major SE

Is there a way to configure the robots.txt so that the site accepts visits ONLY from Google, Yahoo! and MSN spiders?

标签： web-crawler robots.txt

4条回答

2楼-- · 2019-03-17 13:36

As everyone know, the robots.txt is a standard to be obeyed by the crawler and hence only well-behaved agents do so. So, putting it or not doesn't matter.

If you have some data, that you do not show on the site as well, you can just change the permission and improve the security.

0人赞添加讨论(0) 举报

ら.Afraid

3楼-- · 2019-03-17 13:46

Why?

Anyone doing evil (e.g., gathering email addresses to spam) will just ignore robots.txt. So you're only going to be blocking legitimate search engines, as robots.txt compliance is voluntary.

But — if you insist on doing it anyway — that's what the User-Agent: line in robots.txt is for.

User-agent: googlebot
Disallow: 

User-agent: *
Disallow: /

With lines for all the other search engines you'd like traffic from, of course. Robotstxt.org has a partial list.

0人赞添加讨论(0) 举报

三岁会撩人

4楼-- · 2019-03-17 13:55

User-agent: *
Disallow: /
User-agent: Googlebot
Allow: /
User-agent: Slurp
Allow: /
User-Agent: msnbot
Disallow:

Slurp is Yahoo's robot

0人赞添加讨论(0) 举报

走好不送

5楼-- · 2019-03-17 13:56

There are more than 3 major search engines depending on which country you are talking. Facebook seem to be doing a good job listing only legitimate ones: https://facebook.com/robots.txt

So your robots.txt can be something like:

User-agent: Applebot
Allow: /

User-agent: baiduspider
Allow: /

User-agent: Bingbot
Allow: /

User-agent: Facebot
Allow: /

User-agent: Googlebot
Allow: /

User-agent: msnbot
Allow: /

User-agent: Naverbot
Allow: /

User-agent: seznambot
Allow: /

User-agent: Slurp
Allow: /

User-agent: teoma
Allow: /

User-agent: Twitterbot
Allow: /

User-agent: Yandex
Allow: /

User-agent: Yeti
Allow: /

User-agent: *
Disallow: /

0人赞添加讨论(0) 举报

Robots.txt: allow only major SE

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间