I'm new to maven. I want to package a jar of my hadoop project with its dependencies, and then use it like:
hadoop jar project.jar com.abc.def.SomeClass1 -params ...
hadoop jar project.jar com.abc.def.AnotherClass -params ...
And I want to have multiple entry points for this jar (different hadoop jobs).
How could I do it?
Thanks!
There's two ways to create a jar with dependencies:
- Hadoop supports jars in a jar format - meaning that your jar contain contain a lib folder of jars that will be added to the classpath at job submission and map / reduce task execution
- You can unpack the jar dependencies and re-pack them with your classes into a single monolithic jar.
The first will require you to create a maven assembly definition file but in reality is more hassle than it's worth. The second also uses maven assemblies but utilizes a built in descriptor. To use the second, just add the following to your project -> build -> plugins
section in the pom:
<plugin>
<artifactId>maven-assembly-plugin</artifactId>
<version>2.4</version>
<configuration>
<descriptorRefs>
<descriptorRef>jar-with-dependencies</descriptorRef>
</descriptorRefs>
</configuration>
</plugin>
Now when you run mvn package you'll get two jars in your target folder:
${project.name}-${project.version}.jar
- Which will just contain classes and resources for your project
${project.name}-${project.version}-jar-with-dependencies.jar
- which will contain your classes / resources and everything from your dependency tree with a scope of compile unpacked and repacked into a single jar
For multi entry points, you don't need to do anything specific, just make sure you don't define a Main-Class
entry in the jar manifest (if you explicitly configure a manifest, otherwise the default doesn't name a Main-Class so you should be good)