robots.txt parser java

2019-04-07 04:53发布

I want to know how to parse the robots.txt in java.

Is there already any code?

3条回答
倾城 Initia
2楼-- · 2019-04-07 05:27

There is also a new release of crawler-commons:

https://github.com/crawler-commons/crawler-commons

The library aims to implement functionality common to any web crawler and this includes a very handy robots.txt parser

查看更多
祖国的老花朵
3楼-- · 2019-04-07 05:46

Heritrix is an open-source web crawler written in Java. Looking through their javadoc, I see that they have a utility class Robotstxt for parsing the robots.txt file.

查看更多
乱世女痞
4楼-- · 2019-04-07 05:49

There's also jrobotx library hosted at SourceForge.

(Full disclosure: I spun off the code that forms that library.)

查看更多
登录 后发表回答