Im using spark (in java API) and require a single jar that can be pushed to the cluster, however the jar itself should not include spark. The app that deploys the jobs of course should include spark.
I would like:
- sbt run - everything should be compiled and excuted
- sbt smallAssembly - create a jar without spark
- sbt assembly - create an uber jar with everything (including spark) for ease of deployment.
I have 1. and 3. working. Any ideas on how I can 2. ? What code would I need to add to my build.sbt file?
The question is not relevant only to spark, but any other dependency that I may wish to exclude as well.
For beginners like me, simply add the
% Provided
to Spark dependencies to exclude them from an uber-jar:in
build.sbt
.% "provided" configuration
The first option to exclude a jar from the fat jar is to use
"provided"
configuration on the library dependency."provided"
comes from Maven's provided scope that's defined as follows:Since you're deploying your code to a container (in this case Spark), contrary to your comment you'd probably need Scala standard library, and other library jars (e.g. Dispatch if you used it). This won't affect
run
ortest
.packageBin
If you just want your source code, and no Scala standard library or other library dependencies, that would be
packageBin
built into sbt. This packaged jar can be combined with dependency-only jar you can make using sbt-assembly'sassemblyPackageDependency
.excludedJars in assembly
The final option is to use
excludedJars in assembly
: