Class not found in Hadoop job

2020-05-06 12:16发布

I have a map reduce job which gets its input from DocumentDB. I've added to jar files under the lib directory in my source code and also user the -libjars when running the job. but I still get the class not found error for a class in the jar file. Here is some part of my driver program

public class MapReduceDriver extends Configured implements Tool  {

public static void main(String[] args) throws Exception {

    int res = ToolRunner.run(new Configuration(), new MapReduceDriver(), args);
    System.exit(res);

}



@Override
public int run(String[] args) throws Exception {

    Configuration conf =  this.getConf();
    ....

When using the -libjars I once put the required jar files on the local driver and once on the hdfs but neither worked. How can I make sure that the -libjars works?

p.s. I'm using 2-node HDInsight cluster (running in Microsoft Azure).

Here is the error message I get

 Error: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class com.microsoft.azure.documentdb.hadoop.DocumentDBInputFormat not found
    at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1961)
    at org.apache.hadoop.mapreduce.task.JobContextImpl.getInputFormatClass(JobContextImpl.java:174)
    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:726)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
    at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:415)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1594)
    at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
Caused by: java.lang.ClassNotFoundException: Class com.microsoft.azure.documentdb.hadoop.DocumentDBInputFormat not found
    at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1867)
    at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1959)
    ... 8 more

2条回答
够拽才男人
2楼-- · 2020-05-06 12:34

HDInsight is using templton which doesn't have support for libjars, so you can't use that templton docs

Also, I'm assuming you are building a custom HDInsight cluster using a powershell script. You can copy all the jars with dependencies to HADOOP_HOME + '\share\hadoop\common\lib this would be the hadoop lib folder.

Or you can directly use the powershell script published with changing the path that contains the dependency jars ( add your jars to an azure blob contains and just replace the path ) powershell script

查看更多
兄弟一词,经得起流年.
3楼-- · 2020-05-06 12:43

I assume you are referring to the DocumentDB Hadoop connector jar found here: https://github.com/Azure/azure-documentdb-hadoop

The jar does not include dependencies. You can either have maven to retrieve dependencies for you, or manually download and include in the build path yourself.

Here are the dependencies:

查看更多
登录 后发表回答