sbt assembly shading to create fat jar to run on s

2020-07-05 05:31发布

问题:

I'm using sbt assembly to create a fat jar which can run on spark. Have dependencies on grpc-netty. Guava version on spark is older than the one required by grpc-netty and I run into this error: java.lang.NoSuchMethodError: com.google.common.base.Preconditions.checkArgument. I was able to resolve this by setting userClassPathFirst to true on spark, but leads to other errors.

Correct me if I am wrong, but from what I understand, I shouldn't have to set userClassPathFirst to true if I do shading correctly. Here's how I do shading now:

assemblyShadeRules in assembly := Seq(
  ShadeRule.rename("com.google.guava.**" -> "my_conf.@1")
    .inLibrary("com.google.guava" % "guava" % "20.0")
    .inLibrary("io.grpc" % "grpc-netty" % "1.1.2")
)

libraryDependencies ++= Seq(
  "org.scalaj" %% "scalaj-http" % "2.3.0",
  "org.json4s" %% "json4s-native" % "3.2.11",
  "org.json4s" %% "json4s-jackson" % "3.2.11",
  "org.apache.spark" %% "spark-core" % "2.2.0" % "provided",
  "org.apache.spark" % "spark-sql_2.11" % "2.2.0" % "provided",
  "org.clapper" %% "argot" % "1.0.3",
  "com.typesafe" % "config" % "1.3.1",
  "com.databricks" %% "spark-csv" % "1.5.0",
  "org.apache.spark" % "spark-mllib_2.11" % "2.2.0" % "provided",
  "io.grpc" % "grpc-netty" % "1.1.2",
  "com.google.guava" % "guava" % "20.0"
)

What am I doing wrong here and how do I fix it?

回答1:

You are almost there. What shadeRule does is it renames class names, not library names:

The main ShadeRule.rename rule is used to rename classes. All references to the renamed classes will also be updated.

In fact, in com.google.guava:guava there are no classes with package com.google.guava:

$ jar tf ~/Downloads/guava-20.0.jar  | sed -e 's:/[^/]*$::' | sort | uniq
META-INF
META-INF/maven
META-INF/maven/com.google.guava
META-INF/maven/com.google.guava/guava
com
com/google
com/google/common
com/google/common/annotations
com/google/common/base
com/google/common/base/internal
com/google/common/cache
com/google/common/collect
com/google/common/escape
com/google/common/eventbus
com/google/common/graph
com/google/common/hash
com/google/common/html
com/google/common/io
com/google/common/math
com/google/common/net
com/google/common/primitives
com/google/common/reflect
com/google/common/util
com/google/common/util/concurrent
com/google/common/xml
com/google/thirdparty
com/google/thirdparty/publicsuffix

It should be enough to change your shading rule to this:

assemblyShadeRules in assembly := Seq(
  ShadeRule.rename("com.google.common.**" -> "my_conf.@1")
    .inLibrary("com.google.guava" % "guava" % "20.0")
    .inLibrary("io.grpc" % "grpc-netty" % "1.1.2")
)

So you don't need to change userClassPathFirst.

Moreover, you can simplify your shading rule like this:

assemblyShadeRules in assembly := Seq(
  ShadeRule.rename("com.google.common.**" -> "my_conf.@1").inAll
)

Since org.apache.spark dependencies are provided, they will not be included in your jar and will not be shaded (hence spark will use its own unshaded version of guava that it has on the cluster).