What is use of method addJar() in Spark?

2020-04-16 02:17发布

In spark job, I don't know how to import and use the jars that is shared by method SparkContext.addJar(). It seems that this method is able to move jars into some place that are accessible by other nodes in the cluster, but I do not know how to import them.
This is an example:

package utils;

public class addNumber {
    public int addOne(int i){
        return i + 1;
    }
    public int addTwo(int i){
        return i + 2;
    }
}

I create a class called addNumber and make it into a jar file utils.jar.

Then I create a spark job and codes are shown below:

import org.apache.spark.SparkConf
import org.apache.spark.SparkContext

object TestDependencies {
  def main(args:Array[String]): Unit = {
    val sparkConf = new SparkConf
    val sc = new SparkContext(sparkConf)
    sc.addJar("/path/to//utils.jar")

    val data = 1 to 100 toList
    val rdd = sc.makeRDD(data)

    val rdd_1 = rdd.map ( x => {
      val handler = new utils.addNumber
      handler.addOne(x)
    } )

    rdd_1.collect().foreach { x => print(x + "||") }
  }
}

The error "java.lang.NoClassDefFoundError: utils/addNumber" raised after submission of the job through command "spark-submit".

I know that method addJar() does not guarantee jars included into class path of the spark job. If I want to use the jar files I have move all of dependencies to the same path in each node of cluster. But if I can move and include all of the jars, what is the use of method addJar()?

I am wondering if there is a way using jars imported by method addJar(). Thanks in advance.

1条回答
够拽才男人
2楼-- · 2020-04-16 03:04

Did you try set the path of jar with prefix "local"? From documentation:

public void addJar(String path)

Adds a JAR dependency for all tasks to be executed on this SparkContext in the future. The path passed can be either a local file, a file in HDFS (or other Hadoop-supported filesystems), an HTTP, HTTPS or FTP URI, or local:/path for a file on every worker node.

You can try this way as well:

val conf = new SparkConf()
             .setMaster('local[*]')
             .setAppName('tmp')
             .setJars(Array('/path1/one.jar', '/path2/two.jar'))

val sc = new SparkContext(conf)

and take a look here, check spark.jars option

and set "--jars" param in spark-submit:

--jars /path/1.jar,/path/2.jar

or edit conf/spark-defaults.conf:

spark.driver.extraClassPath /path/1.jar:/fullpath/2.jar
spark.executor.extraClassPath /path/1.jar:/fullpath/2.jar
查看更多
登录 后发表回答