Hadoop accessing 3rd party libraries from local fi

2019-02-19 11:30发布

问题:

I have a jar file on all my Hadoop nodes at /home/ubuntu/libs/javacv-0.9.jar , with some other jar files.

When my MapReduce application is executing on Hadoop nodes, I am getting this exception

java.io.FileNotFoundException: File does not exist hdfs://192.168.0.18:50000/home/ubuntu/libs/javacv-0.9.jar

How can I resolve this exception? How can my jar running in Hadoop access 3rd party libraries from the local file system of the Hadoop node?

回答1:

You need to copy your file to HDFS and not to the local filesystem.

To copy files to HDFS you need to use:

hadop fs -put localfile hdfsPath

Other option is to change the file path to:

file:///home/ubuntu/libs/javacv-0.9.jar

To add jar files to the classpath, take a look at DistributedCache:

DistributedCache.addFileToClassPath(new Path("file:///home/ubuntu/libs/javacv-0.9.jar"), job);

You may need to iterate over all jar files in that directory.



回答2:

Another option would be to use distributed cache's addFileToClassPath(new Path("/myapp/mylib.jar"), job); to submit the Jar files that should be added to the classpath of your mapper and reducer tasks.

Note: Make sure you copy the jar file to HDFS first.

You could even add jar files to class path by using hadoop command line argument -libjars <jar_file>.

Note: Make sure your MapReduce application implements ToolRunner to allow -libjars option from command line.