The error java.lang.ClassNotFoundException when ru

2019-09-11 14:38发布

问题:

I want run spark-submit for my Scala Spark application. These are the steps I did:

1) execute Maven Clean and Package from IntellijIDEA to get myTest.jar 2) execute the following spark-submit command:

spark-submit --name 28 --master local[2] --class org.test.consumer.TestRunner \
/usr/tests/test1/target/myTest.jar \
$arg1 $arg2 $arg3 $arg4 $arg5

This is the TestRunner object that I want to run:

package org.test.consumer

import org.test.consumer.kafka.KafkaConsumer

object TestRunner {

  def main(args: Array[String]) {

    val Array(zkQuorum, group, topic1, topic2, kafkaNumThreads) = args

    val processor = new KafkaConsumer(zkQuorum, group, topic1, topic2)
    processor.run(kafkaNumThreads.toInt)

  }

}

But the spark-submit command fails with the following message:

java.lang.ClassNotFoundException: org.test.consumer.TestRunner
    at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
    at java.lang.Class.forName0(Native Method)
    at java.lang.Class.forName(Class.java:348)
    at org.apache.spark.util.Utils$.classForName(Utils.scala:225)
    at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:686)
    at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:185)
    at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:210)
    at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:124)

I don't really understand why the object TestRunner cannot be found, if the package is specified correctly... Has it something to do with the usage of object instead of class?

UPDATE:

The project structure (the folder scala is currently marked as Sources):

/usr/tests/test1
  .idea
  src
    main
      docker
      resources
      scala
        org
          test
             consumer
                kafka
                    KafkaConsumer.scala
                TestRunner.scala
    test
  target

pom.xml

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
    <modelVersion>4.0.0</modelVersion>

    <groupId>org.test.abc</groupId>
    <artifactId>consumer</artifactId>
    <version>1.0-SNAPSHOT</version>

    <dependencies>
        <dependency>
            <groupId>org.scala-lang</groupId>
            <artifactId>scala-library</artifactId>
            <version>2.11.8</version>
        </dependency>
        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-streaming_2.11</artifactId>
            <version>1.6.2</version>
        </dependency>
        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-streaming-kafka_2.11</artifactId>
            <version>1.6.2</version>
        </dependency>
        <dependency>
            <groupId>com.fasterxml.jackson.module</groupId>
            <artifactId>jackson-module-scala_2.11</artifactId>
            <version>2.7.5</version>
        </dependency>
        <dependency>
            <groupId>com.fasterxml.jackson.core</groupId>
            <artifactId>jackson-core</artifactId>
            <version>2.7.5</version>
        </dependency>
        <dependency>
            <groupId>com.fasterxml.jackson.core</groupId>
            <artifactId>jackson-databind</artifactId>
            <version>2.7.5</version>
        </dependency>
        <dependency>
            <groupId>org.sedis</groupId>
            <artifactId>sedis_2.11</artifactId>
            <version>1.2.2</version>
        </dependency>
        <dependency>
            <groupId>com.lambdaworks</groupId>
            <artifactId>jacks_2.11</artifactId>
            <version>2.3.3</version>
        </dependency>
        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-sql_2.11</artifactId>
            <version>1.6.2</version>
        </dependency>
        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-mllib-local_2.11</artifactId>
            <version>2.0.0</version>
        </dependency>
        <dependency>
            <groupId>com.github.nscala-time</groupId>
            <artifactId>nscala-time_2.11</artifactId>
            <version>2.12.0</version>
        </dependency>
    </dependencies>

</project>

回答1:

@FiofanS, the problem is in your directory structure.

Maven uses a convention over configuratation policy. It means, by default, maven expects that you will follow the set of rules that it has defined. For example, it expects you to put all your code in src/main/java directory (See Maven Standard Directory Structure). But you don't have your code in src/main/java directory. Instead, you have it in src/main/scala directory. By default, maven will not consider src/main/scala as source location.

Although , maven expects you to follow the rules it has defined, but it doesn't enforce them. It also provides you ways in which you can configure things based on your preference.
In your case, you will have to explicitly instruct maven to consider src/main/scala also as one of your source location.

To do this, you will have to use the Maven Build Helper Plugin.
Add the below piece of code within the <project>...</project> tag in your pom.xml

  <build>
    <plugins>
      <plugin>
        <groupId>org.codehaus.mojo</groupId>
        <artifactId>build-helper-maven-plugin</artifactId>
        <version>1.7</version>
        <executions>
          <execution>
            <id>add-source</id>
            <phase>generate-sources</phase>
            <goals>
              <goal>add-source</goal>
            </goals>
            <configuration>
              <sources>
                <source>src/main/scala</source>
              </sources>
            </configuration>
          </execution>
        </executions>
      </plugin>
    </plugins>
  </build>

This should solve your problem.