In spark job, I don't know how to import and use the jars that is shared by method SparkContext.addJar()
. It seems that this method is able to move jars into some place that are accessible by other nodes in the cluster, but I do not know how to import them.
This is an example:
package utils;
public class addNumber {
public int addOne(int i){
return i + 1;
}
public int addTwo(int i){
return i + 2;
}
}
I create a class called addNumber and make it into a jar file utils.jar
.
Then I create a spark job and codes are shown below:
import org.apache.spark.SparkConf
import org.apache.spark.SparkContext
object TestDependencies {
def main(args:Array[String]): Unit = {
val sparkConf = new SparkConf
val sc = new SparkContext(sparkConf)
sc.addJar("/path/to//utils.jar")
val data = 1 to 100 toList
val rdd = sc.makeRDD(data)
val rdd_1 = rdd.map ( x => {
val handler = new utils.addNumber
handler.addOne(x)
} )
rdd_1.collect().foreach { x => print(x + "||") }
}
}
The error "java.lang.NoClassDefFoundError: utils/addNumber" raised after submission of the job through command "spark-submit"
.
I know that method addJar()
does not guarantee jars included into class path of the spark job. If I want to use the jar files I have move all of dependencies to the same path in each node of cluster. But if I can move and include all of the jars, what is the use of method addJar()
?
I am wondering if there is a way using jars imported by method addJar()
. Thanks in advance.
Did you try set the path of jar with prefix "local"? From documentation:
You can try this way as well:
and take a look here, check spark.jars option
and set "--jars" param in spark-submit:
or edit conf/spark-defaults.conf: