The title could also be:
What are the differences between Maven and SBT assembly plugins.
I have found this to be an issue, while migrating a project from Maven to SBT.
To describe the problem I have created an example project with dependencies that I found to behave differently, depending on the build tool.
https://github.com/atais/mvn-sbt-assembly
The only dependencies are (sbt style)
"com.netflix.astyanax" % "astyanax-cassandra" % "3.9.0",
"org.apache.cassandra" % "cassandra-all" % "3.4",
and what I do not understand is, why mvn package
creates the fat jar successfully, while sbt assembly
gives conflicts:
[error] 39 errors were encountered during merge
[error] java.lang.RuntimeException: deduplicate: different file contents found in the following:
[error] /home/siatkowskim/.ivy2/cache/org.slf4j/jcl-over-slf4j/jars/jcl-over-slf4j-1.7.7.jar:org/apache/commons/logging/<some classes>
[error] /home/siatkowskim/.ivy2/cache/commons-logging/commons-logging/jars/commons-logging-1.1.1.jar:org/apache/commons/logging/<some classes>
...
[error] /home/siatkowskim/.ivy2/cache/com.github.stephenc.high-scale-lib/high-scale-lib/jars/high-scale-lib-1.1.2.jar:org/cliffc/high_scale_lib/<some classes>
[error] /home/siatkowskim/.ivy2/cache/com.boundary/high-scale-lib/jars/high-scale-lib-1.0.6.jar:org/cliffc/high_scale_lib/<some classes>
...
Extension to Alexey Romanov answer.
I have also updated my project with detailed explanation, so you might want to check it out.
Following the advice
You can verify it for this case by unpacking the jar Maven produces and the dependency jars in SBT error message, then checking which .class file Maven used.
I compared the fat-jars
produced by maven
and sbt
with
MergeStrategy.first
, that showed some extra files
MergeStrategy.last
, that showed binary differences & extra files
I have taken the next step and checked the fat-jars
against the dependencies sbt
found conflicts at, specifically:
- jcl-over-slf4j-1.7.7.jar
- commons-logging-1.1.1.jar
Conclusion
maven-assembly-plugin
resolves conflicts on jar
level.
When it finds any conflict, it picks the first jar
and simply ignores all the content from the other.
Whereas sbt-assembly
mixes all the class
files, resolving conflicts locally, file by file.
My theory would be, that if your fat-jar
made with maven-assembly-plugin
works, you can
specify MergeStrategy.first
for all the conflicts in sbt
.
They only difference would be, that the jar
produced with sbt
will be even bigger, containing extra classes that were ignored by maven
.
It seems maven-assembly-plugin
resolves conflicts equivalently to MergeStrategy.first
(not sure if it's completely equivalent) by just picking one of the files in an unspecified way when jar-with-dependencies
is used (since it only has one phase):
If two or more elements (e.g., file, fileSet) select different sources for the same file for archiving, only one of the source files will be archived.
As per version 2.5.2 of the assembly plugin, the first phase to add the file to the archive "wins". The filtering is done solely based on name inside the archive, so the same source file can be added under different output names. The order of the phases is as follows: 1) FileItem 2) FileSets 3) ModuleSet 4) DepenedencySet and 5) Repository elements.
Elements of the same type will be processed in the order they appear in the descriptors. If you need to "overwrite" a file included by a previous set, the only way to do this is to exclude that file from the earlier set.
Note that this behaviour was slightly different in earlier versions of the assembly plugin.
Even if one of the conflicting files would work for all of your dependencies (which isn't necessarily so), Maven doesn't know which one, so you can just silently get the wrong result. Silently at build-time, I mean; at runtime you can get e.g. AbstractMethodError
, or again just a wrong result.
You can influence which file gets picked by writing your own descriptor, but it's horribly verbose, there's no equivalent to just writing MergeStrategy.first/last
(and concat
/discard
are not allowed).
The SBT plugin could do the same: default to a strategy when you don't specify one, but then, well, you could silently get the wrong result.
From the build.sbt I can see that their is no Merge-Strategy in you build. Plus there is a Rogue "," in your libraryDependencies Key placed after the dependency of "org.apache.cassandra" % "cassandra-all" % "3.4" in your build.sbt in the project to which the link you have shared above.
A merge strategy is required to handle all the duplicate files and in the jar as well as versions. The following one is an example of how to get one in place in your build.
assemblyMergeStrategy in assembly := {
case m if m.toLowerCase.endsWith("manifest.mf") => MergeStrategy.discard
case m if m.toLowerCase.matches("meta-inf.*\\.sf$") => MergeStrategy.discard
case "reference.conf" => MergeStrategy.concat
case x: String if x.contains("UnusedStubClass.class") => MergeStrategy.first
case _ => MergeStrategy.first
}
You could try writing a simple build file if you do not have sub-projects in your project. You can try the following build.sbt.
name := "assembly-test",
version := "0.1",
scalaVersion := "2.12.4",
libraryDependencies ++= Seq(
"com.netflix.astyanax" % "astyanax-cassandra" % "3.9.0",
"org.apache.cassandra" % "cassandra-all" % "3.4"
)
mainClass in assembly := Some("com.atais.cassandra.MainClass")
assemblyMergeStrategy in assembly := {
case m if m.toLowerCase.endsWith("manifest.mf") => MergeStrategy.discard
case m if m.toLowerCase.matches("meta-inf.*\\.sf$") => MergeStrategy.discard
case "reference.conf" => MergeStrategy.concat
case x: String if x.contains("UnusedStubClass.class") => MergeStrategy.first
case _ => MergeStrategy.first
}