I want to run nutch2.3.1 to crawl data on hadoop2. I have 3 nodes for hadoop2:
- crawler1:master
- crawler2:slave
- crawler3:slave
I deployed nutch2.3.1 to crawler1 and run it with following command: /usr/local/nutch/deploy/bin/crawl hdfs://xxx.xxx.xxx.xxx/urls/seed.txt test 5
It works and can crawl data ,but it looks like the crawl job only run on crawler1, the others nodes did not do any job for nutch.
my questions are:
- do I need deploy nutch to crawler2 and crawler3?
- do I need run crawl command on 3 nodes?
- if my steps are wrong ,what are the right steps?
Sorry for my poor English, I really appreciate any help you can provide.