I am trying to load a normal text file into a hive table using Spark. I am using Spark version 2.0.2. I have done it successfully in Spark version: 1.6.0 and I am trying to do the same in version 2x I executed the below steps:
import org.apache.spark.sql.SparkSession
val spark = SparkSession.builder().appName("SparkHiveLoad").master("local").enableHiveSupport().getOrCreate()
import spark.implicits._
There is no problem until now. But when I try to load the file into Spark:
val partfile = spark.read.textFile("hdfs://quickstart.cloudera:8020/user/cloudera/partfile")
I am getting an exception:
Caused by: org.apache.derby.iapi.error.StandardException: Another instance of Derby may have already booted the database /home/cloudera/metastore_db.
The default property in core-site.xml:
<property>
<name>fs.defaultFS</name>
<value>hdfs://quickstart.cloudera:8020</value>
</property>
There were no other hive or spark sessions running on the background. I saw different questions with same exception. So read it once and if you still think it is a duplicate, you can mark it.
Could anyone tell me how can I fix it.
In Spark 2.0.2
spark.sparkContext.textFile
is generally being used to read a textfile.The Scala interface for Spark SQL supports automatically converting an RDD containing case classes to a DataFrame. The case class defines the schema of the table. The names of the arguments to the case class are read using reflection and become the names of the columns. Case classes can also be nested or contain complex types such as Seqs or Arrays. This RDD can be implicitly converted to a DataFrame and then be registered as a table. Tables can be used in subsequent SQL statements.
Sample code:
Please refer Spark Documentation for more information about it and to check other optons as well.