Why Maven assembly works when SBT assembly find co

2019-03-25 12:07发布

问题:

The title could also be:
What are the differences between Maven and SBT assembly plugins.

I have found this to be an issue, while migrating a project from Maven to SBT.

To describe the problem I have created an example project with dependencies that I found to behave differently, depending on the build tool.

https://github.com/atais/mvn-sbt-assembly


The only dependencies are (sbt style)

"com.netflix.astyanax" % "astyanax-cassandra" % "3.9.0",
"org.apache.cassandra" % "cassandra-all" % "3.4",

and what I do not understand is, why mvn package creates the fat jar successfully, while sbt assembly gives conflicts:

[error] 39 errors were encountered during merge
[error] java.lang.RuntimeException: deduplicate: different file contents found in the following:
[error] /home/siatkowskim/.ivy2/cache/org.slf4j/jcl-over-slf4j/jars/jcl-over-slf4j-1.7.7.jar:org/apache/commons/logging/<some classes>
[error] /home/siatkowskim/.ivy2/cache/commons-logging/commons-logging/jars/commons-logging-1.1.1.jar:org/apache/commons/logging/<some classes>
...
[error] /home/siatkowskim/.ivy2/cache/com.github.stephenc.high-scale-lib/high-scale-lib/jars/high-scale-lib-1.1.2.jar:org/cliffc/high_scale_lib/<some classes>
[error] /home/siatkowskim/.ivy2/cache/com.boundary/high-scale-lib/jars/high-scale-lib-1.0.6.jar:org/cliffc/high_scale_lib/<some classes>
...

回答1:

Extension to Alexey Romanov answer.

I have also updated my project with detailed explanation, so you might want to check it out.

Following the advice

You can verify it for this case by unpacking the jar Maven produces and the dependency jars in SBT error message, then checking which .class file Maven used.

I compared the fat-jars produced by maven and sbt with

  • MergeStrategy.first, that showed some extra files
  • MergeStrategy.last, that showed binary differences & extra files

I have taken the next step and checked the fat-jars against the dependencies sbt found conflicts at, specifically:

  • jcl-over-slf4j-1.7.7.jar
  • commons-logging-1.1.1.jar

Conclusion

maven-assembly-plugin resolves conflicts on jar level. When it finds any conflict, it picks the first jar and simply ignores all the content from the other.

Whereas sbt-assembly mixes all the class files, resolving conflicts locally, file by file.

My theory would be, that if your fat-jar made with maven-assembly-plugin works, you can specify MergeStrategy.first for all the conflicts in sbt. They only difference would be, that the jar produced with sbt will be even bigger, containing extra classes that were ignored by maven.



回答2:

It seems maven-assembly-plugin resolves conflicts equivalently to MergeStrategy.first (not sure if it's completely equivalent) by just picking one of the files in an unspecified way when jar-with-dependencies is used (since it only has one phase):

If two or more elements (e.g., file, fileSet) select different sources for the same file for archiving, only one of the source files will be archived.

As per version 2.5.2 of the assembly plugin, the first phase to add the file to the archive "wins". The filtering is done solely based on name inside the archive, so the same source file can be added under different output names. The order of the phases is as follows: 1) FileItem 2) FileSets 3) ModuleSet 4) DepenedencySet and 5) Repository elements.

Elements of the same type will be processed in the order they appear in the descriptors. If you need to "overwrite" a file included by a previous set, the only way to do this is to exclude that file from the earlier set.

Note that this behaviour was slightly different in earlier versions of the assembly plugin.

Even if one of the conflicting files would work for all of your dependencies (which isn't necessarily so), Maven doesn't know which one, so you can just silently get the wrong result. Silently at build-time, I mean; at runtime you can get e.g. AbstractMethodError, or again just a wrong result.

You can influence which file gets picked by writing your own descriptor, but it's horribly verbose, there's no equivalent to just writing MergeStrategy.first/last (and concat/discard are not allowed).

The SBT plugin could do the same: default to a strategy when you don't specify one, but then, well, you could silently get the wrong result.



回答3:

From the build.sbt I can see that their is no Merge-Strategy in you build. Plus there is a Rogue "," in your libraryDependencies Key placed after the dependency of "org.apache.cassandra" % "cassandra-all" % "3.4" in your build.sbt in the project to which the link you have shared above.

A merge strategy is required to handle all the duplicate files and in the jar as well as versions. The following one is an example of how to get one in place in your build.

assemblyMergeStrategy in assembly := {
  case m if m.toLowerCase.endsWith("manifest.mf")       => MergeStrategy.discard
  case m if m.toLowerCase.matches("meta-inf.*\\.sf$")   => MergeStrategy.discard
  case "reference.conf"                                 => MergeStrategy.concat
  case x: String if x.contains("UnusedStubClass.class") => MergeStrategy.first
  case _                                                => MergeStrategy.first
}

You could try writing a simple build file if you do not have sub-projects in your project. You can try the following build.sbt.

name := "assembly-test",

version := "0.1",

scalaVersion := "2.12.4",

libraryDependencies ++= Seq(
      "com.netflix.astyanax" % "astyanax-cassandra" % "3.9.0",
      "org.apache.cassandra" % "cassandra-all" % "3.4"
)

mainClass in assembly := Some("com.atais.cassandra.MainClass")

assemblyMergeStrategy in assembly := {
      case m if m.toLowerCase.endsWith("manifest.mf")       => MergeStrategy.discard
      case m if m.toLowerCase.matches("meta-inf.*\\.sf$")   => MergeStrategy.discard
      case "reference.conf"                                 => MergeStrategy.concat
      case x: String if x.contains("UnusedStubClass.class") => MergeStrategy.first
      case _                                                => MergeStrategy.first
    }