Package a multiple-entry jar using maven for hadoo

2019-09-17 04:49发布

问题:

I'm new to maven. I want to package a jar of my hadoop project with its dependencies, and then use it like:

hadoop jar project.jar com.abc.def.SomeClass1 -params ...
hadoop jar project.jar com.abc.def.AnotherClass -params ...

And I want to have multiple entry points for this jar (different hadoop jobs).

How could I do it?

Thanks!

回答1:

There's two ways to create a jar with dependencies:

  1. Hadoop supports jars in a jar format - meaning that your jar contain contain a lib folder of jars that will be added to the classpath at job submission and map / reduce task execution
  2. You can unpack the jar dependencies and re-pack them with your classes into a single monolithic jar.

The first will require you to create a maven assembly definition file but in reality is more hassle than it's worth. The second also uses maven assemblies but utilizes a built in descriptor. To use the second, just add the following to your project -> build -> plugins section in the pom:

<plugin>
  <artifactId>maven-assembly-plugin</artifactId>
  <version>2.4</version>
  <configuration>
    <descriptorRefs>
      <descriptorRef>jar-with-dependencies</descriptorRef>
    </descriptorRefs>
  </configuration>
</plugin>

Now when you run mvn package you'll get two jars in your target folder:

  1. ${project.name}-${project.version}.jar - Which will just contain classes and resources for your project
  2. ${project.name}-${project.version}-jar-with-dependencies.jar - which will contain your classes / resources and everything from your dependency tree with a scope of compile unpacked and repacked into a single jar

For multi entry points, you don't need to do anything specific, just make sure you don't define a Main-Class entry in the jar manifest (if you explicitly configure a manifest, otherwise the default doesn't name a Main-Class so you should be good)