hadoop -libjars and ClassNotFoundException

2019-04-15 06:26发布

please help, I'm stuck. Here is my code to run job.

hadoop jar mrjob.jar ru.package.Main -files hdfs://0.0.0.0:8020/MyCatalog/jars/metadata.csv -libjars hdfs://0.0.0.0:8020/MyCatalog/jars/opencsv.jar,hdfs://0.0.0.0:8020/MyCatalog/jars/gson.jar,hdfs://0.0.0.0:8020/MyCatalog/jars/my-utils.jar /MyCatalog/http_requests.seq-r-00000 /MyCatalog/output/result_file

I do get these WARNs:

12/10/26 18:35:50 WARN util.GenericOptionsParser: The libjars file hdfs://0.0.0.0:8020/MyCatalog/jars/opencsv.jar is not on the local filesystem. Ignoring.
12/10/26 18:35:50 WARN util.GenericOptionsParser: The libjars file hdfs://0.0.0.0:8020/MyCatalog/jars/gson.jar is not on the local filesystem. Ignoring.
12/10/26 18:35:50 WARN util.GenericOptionsParser: The libjars file hdfs://0.0.0.0:8020/MyCatalog/jars/my-utils.jar is not on the local filesystem. Ignoring.

Then: Exception in thread "main" java.lang.NoClassDefFoundError: on line in Main class where I try to instantiate class from jar named my-utils.jar

  1. All these jars are in hfds (I see them through file browser)
  2. my-utils.jar does contain class which is a reason for NoClassDefFoundError

What do I do wrong?

UPD: I'm inspecting sourcecode of GenericOptionsParser:

/**
   * If libjars are set in the conf, parse the libjars.
   * @param conf
   * @return libjar urls
   * @throws IOException
   */
  public static URL[] getLibJars(Configuration conf) throws IOException {
    String jars = conf.get("tmpjars");
    if(jars==null) {
      return null;
    }
    String[] files = jars.split(",");
    List<URL> cp = new ArrayList<URL>();
    for (String file : files) {
      Path tmp = new Path(file);
      if (tmp.getFileSystem(conf).equals(FileSystem.getLocal(conf))) {
        cp.add(FileSystem.getLocal(conf).pathToFile(tmp).toURI().toURL());
      } else {
        LOG.warn("The libjars file " + tmp + " is not on the local " +
          "filesystem. Ignoring.");
      }
    }
    return cp.toArray(new URL[0]);
  }

So: 1. no spaces between comma 2. still don't get it... I've tried to point to: local file system, hdfs file system, result is the same. Seems like class is not added...

2条回答
爷、活的狠高调
2楼-- · 2019-04-15 06:54

Problem is solved. correct invocation is:

hadoop jar my-job.jar ru.package.Main -files /home/cloudera/uploaded_jars/metadata.csv -libjars /home/cloudera/uploaded_jars/opencsv.jar,/home/cloudera/uploaded_jars/gson.jar,/home/cloudera/uploaded_jars/url-raiting-utils.jar /MyCatalog/http_requests.seq-r-00000 /MyCatalog/output/scoring_result

where

/MyCatalog

is hdfs path,

/home/cloudera/uploaded_jars/

is local fs path The problem was in job jar. Previously I did try to run job using simple jar with only three classes: Mapper, Reducer, Main class. Now I did provide other one generated by maven (it generates two of them) The second job jar contains all dependency libs. in side it. Structure looks like: my-job.jar

-lib

--aopalliance-1.0.jar asm-3.2.jar avro-1.5.4.jar ... commons-beanutils-1.7.0.jar commons-beanutils-core-1.8.0.jar ... zookeeper-3.4.3-cdh4.0.0.jar

There are 76 jars inside lib folder.

It works but I don't understand why.

查看更多
ゆ 、 Hurt°
3楼-- · 2019-04-15 07:04

Just because they are on HDFS, doesn't mean that they are in the classpath of the job you are running.

If you really just want to fix this problem, I would use maven to build a "fat jar" which contains all your dependencies in a single jar. You can do this using the shade plugin.

But, looking at your command, it looks wrong. I think you might have better luck using the "job" command with -libjars, described here. I'm not sure that you can specify external jars using the "hadoop jar" command.

查看更多
登录 后发表回答