please help, I'm stuck. Here is my code to run job.
hadoop jar mrjob.jar ru.package.Main -files hdfs://0.0.0.0:8020/MyCatalog/jars/metadata.csv -libjars hdfs://0.0.0.0:8020/MyCatalog/jars/opencsv.jar,hdfs://0.0.0.0:8020/MyCatalog/jars/gson.jar,hdfs://0.0.0.0:8020/MyCatalog/jars/my-utils.jar /MyCatalog/http_requests.seq-r-00000 /MyCatalog/output/result_file
I do get these WARNs:
12/10/26 18:35:50 WARN util.GenericOptionsParser: The libjars file hdfs://0.0.0.0:8020/MyCatalog/jars/opencsv.jar is not on the local filesystem. Ignoring.
12/10/26 18:35:50 WARN util.GenericOptionsParser: The libjars file hdfs://0.0.0.0:8020/MyCatalog/jars/gson.jar is not on the local filesystem. Ignoring.
12/10/26 18:35:50 WARN util.GenericOptionsParser: The libjars file hdfs://0.0.0.0:8020/MyCatalog/jars/my-utils.jar is not on the local filesystem. Ignoring.
Then: Exception in thread "main" java.lang.NoClassDefFoundError: on line in Main class where I try to instantiate class from jar named my-utils.jar
- All these jars are in hfds (I see them through file browser)
- my-utils.jar does contain class which is a reason for NoClassDefFoundError
What do I do wrong?
UPD: I'm inspecting sourcecode of GenericOptionsParser:
/**
* If libjars are set in the conf, parse the libjars.
* @param conf
* @return libjar urls
* @throws IOException
*/
public static URL[] getLibJars(Configuration conf) throws IOException {
String jars = conf.get("tmpjars");
if(jars==null) {
return null;
}
String[] files = jars.split(",");
List<URL> cp = new ArrayList<URL>();
for (String file : files) {
Path tmp = new Path(file);
if (tmp.getFileSystem(conf).equals(FileSystem.getLocal(conf))) {
cp.add(FileSystem.getLocal(conf).pathToFile(tmp).toURI().toURL());
} else {
LOG.warn("The libjars file " + tmp + " is not on the local " +
"filesystem. Ignoring.");
}
}
return cp.toArray(new URL[0]);
}
So: 1. no spaces between comma 2. still don't get it... I've tried to point to: local file system, hdfs file system, result is the same. Seems like class is not added...
Problem is solved. correct invocation is:
where
is hdfs path,
is local fs path The problem was in job jar. Previously I did try to run job using simple jar with only three classes: Mapper, Reducer, Main class. Now I did provide other one generated by maven (it generates two of them) The second job jar contains all dependency libs. in side it. Structure looks like: my-job.jar
-lib
--aopalliance-1.0.jar asm-3.2.jar avro-1.5.4.jar ... commons-beanutils-1.7.0.jar commons-beanutils-core-1.8.0.jar ... zookeeper-3.4.3-cdh4.0.0.jar
There are 76 jars inside lib folder.
It works but I don't understand why.
Just because they are on HDFS, doesn't mean that they are in the classpath of the job you are running.
If you really just want to fix this problem, I would use maven to build a "fat jar" which contains all your dependencies in a single jar. You can do this using the shade plugin.
But, looking at your command, it looks wrong. I think you might have better luck using the "job" command with -libjars, described here. I'm not sure that you can specify external jars using the "hadoop jar" command.