I follow the Scala tutorial on https://spark.apache.org/docs/2.1.0/quick-start.html
My scala file
/* SimpleApp.scala */
import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
import org.apache.spark.SparkConf
object SimpleApp {
def main(args: Array[String]) {
val logFile = "/data/README.md" // Should be some file on your system
val conf = new SparkConf().setAppName("Simple Application")
val sc = new SparkContext(conf)
val logData = sc.textFile(logFile, 2).cache()
val numAs = logData.filter(line => line.contains("a")).count()
val numBs = logData.filter(line => line.contains("b")).count()
println(s"Lines with a: $numAs, Lines with b: $numBs")
sc.stop()
}
}
and build.sbt
name := "Simple Project"
version := "1.0"
scalaVersion := "2.12.4"
libraryDependencies += "org.apache.spark" % "spark-core_2.10" % "2.2.0"
I ran sbt package
sucessfully (already delete everything except scala source code and build.sbt then run sbt package
again)
[info] Loading project definition from /home/cpu11453local/workspace/testspark_scala/project
[info] Loading settings from build.sbt ...
[info] Set current project to Simple Project (in build file:/home/my_name/workspace/testspark_scala/)
[info] Packaging /home/my_name/workspace/testspark_scala/target/scala-2.12/simple-project_2.12-1.0.jar ...
[info] Done packaging.
[success] Total time: 1 s, completed Nov 8, 2017 12:15:24 PM
However, when I run spark submit
$SPARK_HOME/bin/spark-submit --class "SimpleApp" --master local[4] simple-project_2.12-1.0.jar
I got error
java.lang.NoClassDefFoundError: scala/runtime/LambdaDeserialize
Full spark-submit output on gist
as @Alexey said, change Scala version to 2.11 fixed the problem.
build.sbt
name := "Simple Project"
version := "1.0"
scalaVersion := "2.11.11"
libraryDependencies += "org.apache.spark" % "spark-core_2.11" % "2.2.0"
Note that Scala version MUST MATCH with Spark.
Look at the artifactId, spark-core_2.11 mean it was compatible with scala 2.11 (No backward or forward compatible)
Following is the build.sbt entries for the latest Spark 2.4.1 release sample shown in Spark/Scala online guide:
name := "SimpleApp"
version := "1.0"
scalaVersion := "2.12.8"
libraryDependencies += "org.apache.spark" %% "spark-sql" % "2.4.1"
Though everything works fine inside IntelliJ IDE, the application still fails with the following exception,
Caused by: java.lang.NoClassDefFoundError: scala/runtime/LambdaDeserialize
after creating the package with 'sbt package' command and running the spark-submit from the command line as the following;
spark-submit -v --class SimpleApp --master local[*] target\scala-2.12\simpleapp_2.12-1.0.jar
I have similar issue while following the instructions provided at https://spark.apache.org/docs/2.4.3/quick-start.html
My setup details:
Spark version: 2.4.3
Scala version: 2.12.8
However, when i changed my sbt file to below configuration everything worked fine.(both compilation and running the application jar)
name := "Simple Project"
version := "1.0"
scalaVersion := "2.11.11"
libraryDependencies += "org.apache.spark" %% "spark-sql" % "2.4.3"
It looks like spark 2.4.3 is compatible with 2.11.11 Scala version only. While compiling the sample project sbt has downloaded the Scala 2.11 library from "https://repo1.maven.org/maven2/org/scala-lang/scala-library/2.11.11"
There is definitely some confusion regarding the Scala version to be used for Spark 2.4.3.
As of today (Nov 25, 2019) the doc home page for spark 2.4.3 states:
Spark runs on Java 8+, Python 2.7+/3.4+ and R 3.1+. For the Scala API, Spark 2.4.3 uses Scala 2.12. You will need to use a compatible Scala version (2.12.x).
Note that support for Java 7, Python 2.6 and old Hadoop versions before 2.6.5 were removed as of Spark 2.2.0. Support for Scala 2.10 was removed as of 2.3.0. Support for Scala 2.11 is deprecated as of Spark 2.4.1 and will be removed in Spark 3.0.
Accordingly, the Scala version is supposed to be 2.12.