run nutch2.3.1 on hadoop2

2019-07-25 22:07发布

问题:

I want to run nutch2.3.1 to crawl data on hadoop2. I have 3 nodes for hadoop2:

  • crawler1:master
  • crawler2:slave
  • crawler3:slave

I deployed nutch2.3.1 to crawler1 and run it with following command: /usr/local/nutch/deploy/bin/crawl hdfs://xxx.xxx.xxx.xxx/urls/seed.txt test 5

It works and can crawl data ,but it looks like the crawl job only run on crawler1, the others nodes did not do any job for nutch.

my questions are:

  1. do I need deploy nutch to crawler2 and crawler3?
  2. do I need run crawl command on 3 nodes?
  3. if my steps are wrong ,what are the right steps?

Sorry for my poor English, I really appreciate any help you can provide.

标签: hadoop nutch