Is there a list of known web crawlers? [closed]

2019-03-18 08:40发布

I'm trying to get accurate download numbers for some files on a web server. I look at the user agents and some are clearly bots or web crawlers, but many for many I'm not sure, they may or may not be a web crawler and they are causing many downloads so it's important for me to know.

Is there somewhere a list of know web crawlers with some documentation like user agent, IPs, behavior, etc?

I'm not interested in the official ones, like Google's, Yahoo's, or Microsoft's. Those are generally well behaved and self-indentified.

标签： list documentation web-crawler bots

4条回答

成全新的幸福

2楼-- · 2019-03-18 09:07

I'm using http://www.user-agents.org/ usually as reference, hope this helps you out.

You can also try http://www.robotstxt.org/db.html or http://www.botsvsbrowsers.com.

0人赞添加讨论(0) 举报

Luminary・发光体

3楼-- · 2019-03-18 09:22

I'm maintaining a list of crawler's user-agent patterns at https://github.com/monperrus/crawler-user-agents/.

It's collaborative, you can contribute to it with pull requests.

0人赞添加讨论(0) 举报

乱世女痞

4楼-- · 2019-03-18 09:25

Unfortunately we've found that bot activity is too numerous and varied to be able to accurately filter it. If you want accurate download counts, your best bet is to require javascript to trigger the download. That's basically the only thing that is going to reliably filter out the bots. It's also why all site traffic analytics engines these days are javascript based.

0人赞添加讨论(0) 举报

啃猪蹄的小仙女

5楼-- · 2019-03-18 09:30

http://www.robotstxt.org/db.html is a good place to start. They have an automatable raw feed if you need that too. http://www.botsvsbrowsers.com/ is also helpful.

0人赞添加讨论(0) 举报

Is there a list of known web crawlers? [closed]

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间