可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

I want read CSV files using latest Apache Spark Version i.e 2.2.1 in Windows 7 via cmd but unable to do so because there is some problem with the metastore_db. I tried below steps:

1. spark-shell --packages com.databricks:spark-csv_2.11:1.5.0 //Since my scala 
                                                              // version is 2.11  
 2. val df = spark.read.format("csv").option("header", "true").option("mode", "DROPMALFORMED").load("file:///D:/ResourceData.csv")// As //in latest versions we use SparkSession variable i.e spark instead of //sqlContext variable

but it throws me below error:

  Caused by: org.apache.derby.iapi.error.StandardException: Failed to start database 'metastore_db' with class loader o
.spark.sql.hive.client.IsolatedClientLoader  

Caused by: org.apache.derby.iapi.error.StandardException: Another instance of Derby may have already booted the database

I am able to read csv in 1.6 version but I want to do it in latest version. Can anyone help me with this?? I am stuck since many days .

回答1:

Open Spark Shell

spark-shell

Pass Spark Context through SQLContext and assign it to sqlContext Variable

 val sqlContext = new org.apache.spark.sql.SQLContext(sc) // As Spark context available as 'sc'

Read the CSV file as per your requirement

val bhaskar = sqlContext.read.format("csv")
  .option("header", "true")
  .option("inferSchema", "true")
  .load("/home/burdwan/Desktop/bhaskar.csv") // Use wildcard, with * we will be able to import multiple csv files in a single load ...Desktop/*.csv

Collect the RDDs and Print

bhaskar.collect.foreach(println)

Output

_a1 _a2     Cn      clr clarity depth   aprx price  x       y       z
1   0.23    Ideal   E   SI2     61.5    55   326    3.95    3.98    2.43
2   0.21    Premium E   SI1     59.8    61   326    3.89    3.84    2.31
3   0.23    Good    E   VS1     56.9    65   327    4.05    4.07    2.31
4   0.29    Premium I   VS2     62.4    58   334    4.2     4.23    2.63
5   0.31    Good    J   SI2     63.3    58   335    4.34    4.35    2.75
6   0.24    Good    J   VVS2    63      57   336    3.94    3.96    2.48

回答2:

Finally even this also worked only in linux based O.S. Download apache spark from the official documentation and set it up using this link. Just verify whether you are able to invoke spark-shell. Now enjoy loading and performing actions with any type of file with the latest spark version. I don't know why its not working on windows even though I am running it for the first time.