I am new to Spark and Hive. I am running Spark v1.0.1 with build-in Hive (Spark install with SPARK_HIVE=true sbt/sbt assembly/assembly)
I also config Hive to store Metastore in PostgreSQL database as instruction:
I could config Hive (not build-in with Spark) to use PostgreSQL but I don't know how to get it work with Hive in Spark
In the instruction, I see that I need to put or link postgresql-jdbc.jar to hive/lib so that Hive could include the postgresql-jdbc when it run
$ sudo yum install postgresql-jdbc
$ ln -s /usr/share/java/postgresql-jdbc.jar /usr/lib/hive/lib/postgresql-jdbc.jar
With Build-in Hive in Spark, where should I put the postgresql-jdbc.jar to get it work?
I find the solution for my problem. I need to add
CLASSPATH
forSPARK
so that build-in Hive could usepostgresql-jdbc4.jar
I add 3 environment variables:
SPARK_CLASSPATH is used for spark-shell
SPARK_SUBMIT_CLASSPATH is used for spark-submit (I am not sure)
Now I could use
spark-shell
with build-in Hive which config to useMetastore in Postgres
You have two options:
hive-site.xml
(or make a symlink) under$SPARK_HOME/conf/hive-site.xml
If you want to use the built-in hive: you need to modify the
$SPARK_HOME/hive-<version>/conf/hive-site.xml
.Inside the
hive-site.xml
you need to modify thejavax.jdo.option.*
values. Along the lines of the following: