Can anyone please tell me if there is any way for apache nutch to ignore or bypass robots.txt while crawling. I am using nutch 2.2.1. I found that "RobotRulesParser.java"(full path:-src/plugin/lib-http/src/java/org/apache/nutch/protocol/http/api/ RobotRulesParser.java) is responsible for the reading and parsing the robots.txt. Is there any way to modify this file to ignore robots.txt and go on with crawling?
Or is there any other way to achieve the same?