I want read CSV files using latest Apache Spark Version i.e 2.2.1
in Windows 7 via cmd
but unable to do so because there is some problem with the metastore_db
. I tried below steps:
1. spark-shell --packages com.databricks:spark-csv_2.11:1.5.0 //Since my scala
// version is 2.11
2. val df = spark.read.format("csv").option("header", "true").option("mode", "DROPMALFORMED").load("file:///D:/ResourceData.csv")// As //in latest versions we use SparkSession variable i.e spark instead of //sqlContext variable
but it throws me below error:
Caused by: org.apache.derby.iapi.error.StandardException: Failed to start database 'metastore_db' with class loader o
.spark.sql.hive.client.IsolatedClientLoader
Caused by: org.apache.derby.iapi.error.StandardException: Another instance of Derby may have already booted the database
I am able to read csv in 1.6 version but I want to do it in latest version. Can anyone help me with this?? I am stuck since many days .
Open Spark Shell
spark-shell
Pass Spark Context through SQLContext and assign it to sqlContext Variable
val sqlContext = new org.apache.spark.sql.SQLContext(sc) // As Spark context available as 'sc'
Read the CSV file as per your requirement
val bhaskar = sqlContext.read.format("csv")
.option("header", "true")
.option("inferSchema", "true")
.load("/home/burdwan/Desktop/bhaskar.csv") // Use wildcard, with * we will be able to import multiple csv files in a single load ...Desktop/*.csv
Collect the RDDs and Print
bhaskar.collect.foreach(println)
Output
_a1 _a2 Cn clr clarity depth aprx price x y z
1 0.23 Ideal E SI2 61.5 55 326 3.95 3.98 2.43
2 0.21 Premium E SI1 59.8 61 326 3.89 3.84 2.31
3 0.23 Good E VS1 56.9 65 327 4.05 4.07 2.31
4 0.29 Premium I VS2 62.4 58 334 4.2 4.23 2.63
5 0.31 Good J SI2 63.3 58 335 4.34 4.35 2.75
6 0.24 Good J VVS2 63 57 336 3.94 3.96 2.48
Finally even this also worked only in linux based O.S. Download apache spark from the official documentation and set it up using this link. Just verify whether you are able to invoke spark-shell
. Now enjoy loading and performing actions with any type of file with the latest spark version. I don't know why its not working on windows even though I am running it for the first time.