howto add hive properties at runtime in spark-shel

2020-03-03 06:21发布

问题:

How do you set a hive property like: hive.metastore.warehouse.dir at runtime? Or at least a more dynamic way of setting a property like the above, than putting it in a file like spark_home/conf/hive-site.xml

回答1:

I faced the same issue and for me it worked by setting Hive properties from Spark (2.4.0). Please find below all the options through spark-shell, spark-submit and SparkConf.

Option 1 (spark-shell)

spark-shell --conf spark.hadoop.hive.metastore.warehouse.dir=some_path\metastore_db_2

Initially I tried with spark-shell with hive.metastore.warehouse.dir set to some_path\metastore_db_2. Then I get the next warning:

Warning: Ignoring non-spark config property: hive.metastore.warehouse.dir=C:\winutils\hadoop-2.7.1\bin\metastore_db_2

Although when I create a Hive table with:

bigDf.write.mode("overwrite").saveAsTable("big_table")

The Hive metadata are stored correctly under metastore_db_2 folder.

When I use spark.hadoop.hive.metastore.warehouse.dir the warning disappears and the results are still saved in the metastore_db_2 directory.

Option 2 (spark-submit)

In order to use hive.metastore.warehouse.dir when submitting a job with spark-submit I followed the next steps.

First I wrote some code to save some random data with Hive:

import org.apache.spark.SparkConf
import org.apache.spark.sql.SparkSession

val sparkConf = new SparkConf().setAppName("metastore_test").setMaster("local")
val spark = SparkSession.builder().config(sparkConf).getOrCreate()

import spark.implicits._
var dfA = spark.createDataset(Seq(
      (1, "val1", "p1"),
      (2, "val1", "p2"),
      (3, "val2", "p3"),
      (3, "val3", "p4"))).toDF("id", "value", "p")

dfA.write.mode("overwrite").saveAsTable("metastore_test")

spark.sql("select * from metastore_test").show(false)

Next I submitted the job with:

spark-submit --class org.tests.Main \
        --conf spark.hadoop.hive.metastore.warehouse.dir=C:\winutils\hadoop-2.7.1\bin\metastore_db_2 
        spark-scala-test_2.11-0.1.jar 

The metastore_test table was properly created under the C:\winutils\hadoop-2.7.1\bin\metastore_db_2 folder.

Option 3 (SparkConf)

Via SparkSession in the Spark code.

val sparkConf = new SparkConf()
      .setAppName("metastore_test")
      .set("spark.hadoop.hive.metastore.warehouse.dir", "C:\\winutils\\hadoop-2.7.1\\bin\\metastore_db_2")
      .setMaster("local")

This attempt was successful as well.

The question which still remains is why I have to extend the property with spark.hadoop in order to work as expected?