I am trying to load a normal text file into a hive table using Spark. I am using Spark version 2.0.2. I have done it successfully in Spark version: 1.6.0 and I am trying to do the same in version 2x
I executed the below steps:
import org.apache.spark.sql.SparkSession
val spark = SparkSession.builder().appName("SparkHiveLoad").master("local").enableHiveSupport().getOrCreate()
import spark.implicits._
There is no problem until now.
But when I try to load the file into Spark:
val partfile = spark.read.textFile("hdfs://quickstart.cloudera:8020/user/cloudera/partfile")
I am getting an exception:
Caused by: org.apache.derby.iapi.error.StandardException: Another instance of Derby may have already booted the database /home/cloudera/metastore_db.
The default property in core-site.xml:
<property>
<name>fs.defaultFS</name>
<value>hdfs://quickstart.cloudera:8020</value>
</property>
There were no other hive or spark sessions running on the background.
I saw different questions with same exception. So read it once and if you still think it is a duplicate, you can mark it.
Could anyone tell me how can I fix it.
In Spark 2.0.2 spark.sparkContext.textFile
is generally being used to
read a textfile.
The Scala interface for Spark SQL supports automatically converting an RDD containing case classes to a DataFrame. The case class defines the schema of the table. The names of the arguments to the case class are read using reflection and become the names of the columns. Case classes can also be nested or contain complex types such as Seqs or Arrays. This RDD can be implicitly converted to a DataFrame and then be registered as a table. Tables can be used in subsequent SQL statements.
Sample code:
mport org.apache.spark.sql.catalyst.encoders.ExpressionEncoder
import org.apache.spark.sql.Encoder
// For implicit conversions from RDDs to DataFrames
import spark.implicits._
// Create an RDD of Person objects from a text file, convert it to a Dataframe
val peopleDF = spark.sparkContext
.textFile("examples/src/main/resources/people.txt")
.map(_.split(","))
.map(attributes => Person(attributes(0), attributes(1).trim.toInt))
.toDF()
// Register the DataFrame as a temporary view
peopleDF.createOrReplaceTempView("people")
Please refer Spark Documentation for more information about it and to check other optons as well.