I have a three-node cluster running Hadoop 2.2.0 and HBase 0.98.1 and I need to use a Nutch 2.2.1 crawler on top of that. But it only supports Hadoop versions from 1.x branch. By now I am able to submit a Nutch job to my cluster, but it fails with java.lang.NumberFormatException. So my question is pretty simple: how do I make Nutch work in my environment?
相关问题
- Spark on Yarn Container Failure
- enableHiveSupport throws error in java spark code
- spark select and add columns with alias
- Unable to generate jar file for Hadoop
-
hive: cast array
> into map
相关文章
- Java写文件至HDFS失败
- mapreduce count example
- Stacktrace does not print in Glassfish 4.1 Cluster
- hbase-client 2.0.x error
- Could you give me any clue Why 'Cannot call me
- Hive error: parseexception missing EOF
- Exception in thread “main” java.lang.NoClassDefFou
- ClassNotFoundException: org.apache.spark.SparkConf
At the moment it's impossible to integrate Nutch 2.2.1 (Gora 0.3) with HBase 0.98.x. See: https://issues.apache.org/jira/browse/GORA-304
Official Nutch tutorial recommends only 0.90.x HBase branch: http://wiki.apache.org/nutch/Nutch2Tutorial
Also you can download HBase 0.94.24-hadoop-2.5.0 version which I created and tested today: https://github.com/dobromyslov/hbase/releases/tag/0.94.24-hadoop-2.5.0
Take a note that Nutch 2.2.1 does not support HBase 0.94.x and you have to get the latest Nutch 2.x from Git branch: https://github.com/apache/nutch/tree/2.x