I'm trying to run a crawl using Nutch in Eclipse.
I'm using a file called urls, and it contains
http://www.google.com/
However, when I run the project, the Generator class tells me that:
"0 records selected for fetching, exiting"
How can I solve this issue?
I've followed these documentations:
http://wiki.apache.org/nutch/RunNutchInEclipse1.0
http://wiki.apache.org/nutch/NutchTutorial
Any help would be greatly appreciated.
I recently ran into this issue and found that most responses concerned the (regex|crawl)-urlfiters.txt. Another thing to check is your '-topN' settings. This needs to be large enough for the generator to pass all filters.
I hope this helps.
Its most likely your regex-urlfilter.xml. Try using this and see if it fixes the problem
-^(file|ftp|mailto):
-.(gif|GIF|jpg|JPG|png|PNG|ico|js|ICO|doc|mp3|MP3|DOC|css|rss|sit|eps|wmf|zip|ppt|mpg|xls|gz|rpm|tgz|mov|MOV|exe|jpeg|JPEG|bmp|BMP)$
-.*(/[^/]+)/[^/]+\1/[^/]+\1/
+.