Spark SQL doesnt let me create a table, complains

2019-09-02 17:58发布

问题:

Installed Spark 1.5 spark-1.5.0-bin-hadoop2.6 on my local machine. Ran $ ./bin/spark-shell Tried, following the doc to create a table, getting this:

> SQL context available as sqlContext.
> 
> scala> sqlContext.sql("CREATE TABLE IF NOT EXISTS src (key INT, value
> STRING)"); 15/09/22 22:18:13 ERROR DDLTask:
> org.apache.hadoop.hive.ql.metadata.HiveException:
> MetaException(message:file:/user/hive/warehouse/src is not a directory
> or unable to create one)  at
> org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:720)

Tried passing the hive parameter for this, but didnt work:

> $  ./bin/spark-shell --conf hive.metastore.warehouse.dir=./ Warning:
> Ignoring non-spark config property: hive.metastore.warehouse.dir=./

Finally tried the CLI itself, but getting the same issue. Where do i change the hive warehouse parameter location ? I dont have Hadoop installed at the moment, nor hive.

thanks, Matt

回答1:

Metadata of hive tables are strored in metastore, hive context adds support for finding tables in the MetaStore.

import org.apache.spark.sql.hive.HiveContext
val hiveContext = new HiveContext(sc)
val myDF = sql("select * from mytable")

You will get dataFrame as result

myDF: org.apache.spark.sql.DataFrame = [.....]


回答2:

I met this problem when spark-shell didn't have the access right to write to /user/hive/warehouse

  1. sudo spark-shell to have another try. If it works, do the second step.
  2. change access right of the dir and make it is same with spark-shell command.


回答3:

Actually you don't really have to got Hive installed (nor Hadoop, but you need to get a hive-site.xml presented in your spark class path ( simplest way to add hive-site.xml to your spark conf directory)

here is a simple default hive-site.xml

<configuration>
<property>
   <name>javax.jdo.option.ConnectionURL</name>
   <value>jdbc:derby:;databaseName=/PATH/TO/YOUR/METASTORE/DIR/metastore_db;create=true</value>
   <description>JDBC connect string for a JDBC metastore</description>
</property>

<property>
   <name>javax.jdo.option.ConnectionDriverName</name>
   <value>org.apache.derby.jdbc.EmbeddedDriver</value>
   <description>Driver class name for a JDBC metastore</description>
</property>

<property>
    <name>hive.metastore.warehouse.dir</name>
    <value>PATH/TO/YOUR/WAREHOSUE/DIR/</value>
    <description>location of default database for the warehouse</description>
</property>
</configuration>

Some times, when the metastore is local derby database, it might have locks which has not been deleted, if you are experienceing a problem about metstore locks, you could delete the locks (make sure it is just you who is using the metastore first ;) ) :

$ rm  /PATH/TO/YOUR/METASTORE/DIR/metastore_db/*.lck