Spark java.lang.NoSuchMethodError From Janino and

2019-05-24 04:45发布

问题:

I am building an application that uses Spark for Random Forest based classification. When trying to run this program I am getting an exception from the line: StringIndexerModel labelIndexer = new StringIndexer().setInputCol("label").setOutputCol("indexedLabel").fit(data);

It looks like that the code somehow reaches Janino version 2.7.8, although I understand I need 3.0.7. I have no idea how to set the dependencies correctly in order to force the build to use the correct version. It seems that it always tries to use 2.7.8.

Is it possible that somehow I ned to clean the cache?

Here is the line from gradle dependencies:

+--- org.codehaus.janino:janino:3.0.7 -> 2.7.8
|    +--- org.codehaus.janino:commons-compiler:3.0.7

The Gradle section defining the dependencies:

dependencies {
  compile('org.apache.hadoop:hadoop-mapreduce-client-core:2.7.2') { force = true }
  compile('org.apache.hadoop:hadoop-common:2.7.2') { force = true }
  // https://mvnrepository.com/artifact/org.codehaus.janino/janino
  compile (group: 'org.codehaus.janino', name: 'janino', version: '3.0.7') {
    force = true
    exclude group: 'org.codehaus.janino', module: 'commons-compiler'
  }
  // https://mvnrepository.com/artifact/org.codehaus.janino/commons-compiler
  compile (group: 'org.codehaus.janino', name: 'commons-compiler', version: '3.0.7') {
    force = true
    exclude group: 'org.codehaus.janino', module: 'janino'
  }
  // https://mvnrepository.com/artifact/org.apache.spark/spark-sql_2.11
  compile (group: 'org.apache.spark', name: 'spark-sql_2.11', version: '2.2.0') {
    exclude group: 'org.codehaus.janino', module: 'janino'
    exclude group: 'org.codehaus.janino', module: 'commons-compiler'
  }
  // https://mvnrepository.com/artifact/org.apache.spark/spark-core_2.11
  compile (group: 'org.apache.spark', name: 'spark-core_2.11', version: '2.2.0') {
    exclude group: 'org.codehaus.janino', module: 'janino'
    exclude group: 'org.codehaus.janino', module: 'commons-compiler'
  }
  // https://mvnrepository.com/artifact/org.apache.spark/spark-mllib_2.11
  compile (group: 'org.apache.spark', name: 'spark-mllib_2.11', version: '2.2.0') {
    exclude group: 'org.codehaus.janino', module: 'janino'
    exclude group: 'org.codehaus.janino', module: 'commons-compiler'
  }
  // https://mvnrepository.com/artifact/com.fasterxml.jackson.core/jackson-databind
  runtime group: 'com.fasterxml.jackson.core', name: 'jackson-databind', version: '2.6.5'
  // https://mvnrepository.com/artifact/com.fasterxml.jackson.module/jackson-module-scala_2.11
  runtime group: 'com.fasterxml.jackson.module', name: 'jackson-module-scala_2.11', version: '2.6.5'
  compile group: 'com.google.code.gson', name: 'gson', version: '2.8.1'
  compile group: 'org.apache.logging.log4j', name: 'log4j-api', version: '2.4.1'
  compile group: 'org.apache.logging.log4j', name: 'log4j-core', version: '2.4.1'
  testCompile 'org.testng:testng:6.9.4'
  testCompile 'org.mockito:mockito-core:1.10.19'
}

The exception:

Exception in thread "main" java.lang.NoSuchMethodError: org.codehaus.commons.compiler.Location.<init>(Ljava/lang/String;SS)V
    at org.codehaus.janino.Scanner.location(Scanner.java:261)
    at org.codehaus.janino.Parser.location(Parser.java:2742)
    at org.codehaus.janino.Parser.parseImportDeclarationBody(Parser.java:209)
    at org.codehaus.janino.ClassBodyEvaluator.makeCompilationUnit(ClassBodyEvaluator.java:255)
    at org.codehaus.janino.ClassBodyEvaluator.cook(ClassBodyEvaluator.java:222)
    at org.codehaus.janino.SimpleCompiler.cook(SimpleCompiler.java:192)
    at org.codehaus.commons.compiler.Cookable.cook(Cookable.java:80)
    at org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$.org$apache$spark$sql$catalyst$expressions$codegen$CodeGenerator$$doCompile(CodeGenerator.scala:960)
    at org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$$anon$1.load(CodeGenerator.scala:1027)
    at org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$$anon$1.load(CodeGenerator.scala:1024)
    at org.spark_project.guava.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3599)
    at org.spark_project.guava.cache.LocalCache$Segment.loadSync(LocalCache.java:2379)
    at org.spark_project.guava.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2342)
    at org.spark_project.guava.cache.LocalCache$Segment.get(LocalCache.java:2257)
    at org.spark_project.guava.cache.LocalCache.get(LocalCache.java:4000)
    at org.spark_project.guava.cache.LocalCache.getOrLoad(LocalCache.java:4004)
    at org.spark_project.guava.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4874)
    at org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$.compile(CodeGenerator.scala:906)
    at org.apache.spark.sql.execution.WholeStageCodegenExec.doExecute(WholeStageCodegenExec.scala:375)
    at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:117)
    at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:117)
    at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:138)
    at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
    at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:135)
    at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:116)
    at org.apache.spark.sql.execution.DeserializeToObjectExec.doExecute(objects.scala:95)
    at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:117)
    at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:117)
    at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:138)
    at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
    at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:135)
    at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:116)
    at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:92)
    at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:92)
    at org.apache.spark.sql.Dataset.rdd$lzycompute(Dataset.scala:2581)
    at org.apache.spark.sql.Dataset.rdd(Dataset.scala:2578)
    at org.apache.spark.ml.feature.StringIndexer.fit(StringIndexer.scala:111)

回答1:

May be you have solved the problem already, I also ran into same error today. However I did not understand why you did these exclusions and they don't seem right to me:

  // https://mvnrepository.com/artifact/org.codehaus.janino/janino
  compile (group: 'org.codehaus.janino', name: 'janino', version: '3.0.7') {
    force = true
    exclude group: 'org.codehaus.janino', module: 'commons-compiler'
  }
  // https://mvnrepository.com/artifact/org.codehaus.janino/commons-compiler
  compile (group: 'org.codehaus.janino', name: 'commons-compiler', version: '3.0.7') {
    force = true
    exclude group: 'org.codehaus.janino', module: 'janino'
  }

We just needed to exclude org.codehaus.janino:commons-compiler from org.apache.spark:spark-mllib_2.11 (other spark dependencies are already present as transitive dependencies to mllib, there is no need to add them or exclude commons-compiler from them individually) and then include org.codehaus.janino:commons-compiler:3.0.7 back.

Here's the dependency block from a working project. My project was built with Maven, but I'm sure anyone can convert this to Gradle equivalent.

<!--Spark Libraries-->
<dependency>
    <groupId>org.apache.spark</groupId>
    <artifactId>spark-core_2.11</artifactId>
    <version>${spark.version}</version>
    <scope>provided</scope>
    <!--Dropping Logger Dependencies-->
    <exclusions>
        <exclusion>
            <groupId>org.slf4j</groupId>
            <artifactId>slf4j-log4j12</artifactId>
        </exclusion>
        <exclusion>
            <groupId>log4j</groupId>
            <artifactId>log4j</artifactId>
        </exclusion>
    </exclusions>
</dependency>

<dependency>
    <groupId>org.apache.spark</groupId>
    <artifactId>spark-mllib_2.11</artifactId>
    <version>${spark.version}</version>
    <scope>provided</scope>
    <!--Dropping commons-compiler-->
    <exclusions>
        <exclusion>
            <groupId>org.codehaus.janino</groupId>
            <artifactId>commons-compiler</artifactId>
        </exclusion>
    </exclusions>
</dependency>

<dependency>
    <groupId>org.codehaus.janino</groupId>
    <artifactId>commons-compiler</artifactId>
    <version>3.0.8</version>
</dependency>

Note: commons-compiler 2.7.8 was working fine for me as well when using with spring boot release version and Elasticsearch 2.4. Only had to upgrade to 3.0.8 after we upgraded to spring boot milestone version 2.0.0.M7 and Elasticsearch 5.6.



回答2:

Similar to Zobayer Hasan's answer, I also needed to update org.codehaus.janino:commons-compiler to version 3.0.8.

In my case, I was only using org.apache.spark:spark-sql_2.11, but I found that it depended on org.codehaus.janino:janino version 3.0.8, and org.codehaus.janino:commons-compiler version 3.0.0. I was able to fix my problem by adding version 3.0.8 of commons-compiler to Maven's dependencyManagement without using exclusions:

<dependencyManagement>
    <dependencies>
        <dependency>
            <groupId>org.codehaus.janino</groupId>
            <artifactId>commons-compiler</artifactId>
            <version>3.0.8</version>
        </dependency>
    </dependencies>
</dependencyManagement>


回答3:

The technique below helped me many times:

System.out.println(TheGhostClass.class.getProtectionDomain().getCodeSource().getLocation());

where the TheGhostClass (org.codehaus.commons.compiler.Location in your case) is the class that might be "lost" due to a preference for an older version of the same library used by your program. This most frequently happens with the cases, when the client software is being deployed into a dominant container, armed with its own classloaders and tons of ancient versions of most popular libs.