I am trying to read XML file using SBT but i am facing issue when i compile it.
build.sbt
name:= "First Spark"
version:= "1.0"
organization := "in.goai"
scalaVersion := "2.11.8"
libraryDependencies += "org.apache.spark" %% "spark-core" % "2.0.0"
libraryDependencies += "org.apache.spark" %% "spark-sql" % "2.0.0"
libraryDependencies += "com.databricks" % "spark-avro_2.10" % "2.0.1"
libraryDependencies += "org.scala-lang.modules" %% "scala-xml" % "1.0.2"
resolvers += Resolver.mavenLocal
.scala file
package in.goai.spark
import scala.xml._
import com.databricks.spark.xml
import org.apache.spark.sql.SQLContext
import org.apache.spark.{SparkContext, SparkConf}
object SparkMeApp {
def main(args: Array[String]) {
val conf = new SparkConf().setAppName("First Spark")
val sc = new SparkContext(conf)
val sqlContext = new SQLContext(sc)
val fileName = args(0)
val df = sqlContext.read.format("com.databricks.spark.xml").option("rowTag", "book").load("fileName")
val selectedData = df.select("title", "price")
val d = selectedData.show
println(s"$d")
}
}
when i compile it by giving "sbt package" it shows bellow error
[error] /home/hadoop/dev/first/src/main/scala/SparkMeApp.scala:4: object xml is not a member of package com.databricks.spark
[error] import com.databricks.spark.xml
[error] ^
[error] one error found
[error] (compile:compileIncremental) Compilation failed
[error] Total time: 9 s, completed Sep 22, 2017 4:11:19 PM
Do i need to add any other jar files related to xml? please suggest and please provide me any link which gives information about jar files for different file formats
Because you're using Scala 2.11 and Spark 2.0, in
build.sbt
, change your dependencies to the following:spark-avro
version to 3.2.0: https://github.com/databricks/spark-avro#requirements"com.databricks" %% "spark-xml" % "0.4.1"
: https://github.com/databricks/spark-xml#scala-211scala-xml
version to 1.0.6, the current version for Scala 2.11: http://mvnrepository.com/artifact/org.scala-lang.modules/scala-xml_2.11In your code, delete the following import statement:
Note that your code doesn't actually use the
spark-avro
orscala-xml
libraries. Remove those dependencies from yourbuild.sbt
(and theimport scala.xml._
statement from your code) if you're not going to use them.