Run Spark with build-in Hive and Configuring a rem

2019-05-10 10:45发布

问题:

I am new to Spark and Hive. I am running Spark v1.0.1 with build-in Hive (Spark install with SPARK_HIVE=true sbt/sbt assembly/assembly)

I also config Hive to store Metastore in PostgreSQL database as instruction:

http://www.cloudera.com/content/cloudera-content/cloudera-docs/CDH4/4.2.0/CDH4-Installation-Guide/cdh4ig_topic_18_4.html

I could config Hive (not build-in with Spark) to use PostgreSQL but I don't know how to get it work with Hive in Spark

In the instruction, I see that I need to put or link postgresql-jdbc.jar to hive/lib so that Hive could include the postgresql-jdbc when it run

$ sudo yum install postgresql-jdbc
$ ln -s /usr/share/java/postgresql-jdbc.jar /usr/lib/hive/lib/postgresql-jdbc.jar

With Build-in Hive in Spark, where should I put the postgresql-jdbc.jar to get it work?

回答1:

I find the solution for my problem. I need to add CLASSPATH for SPARK so that build-in Hive could use postgresql-jdbc4.jar

I add 3 environment variables:

export CLASSPATH="$CLASSPATH:/usr/share/java/postgresql-jdbc4.jar"
export SPARK_CLASSPATH=$CLASSPATH
export SPARK_SUBMIT_CLASSPATH=$CLASSPATH

SPARK_CLASSPATH is used for spark-shell

SPARK_SUBMIT_CLASSPATH is used for spark-submit (I am not sure)

Now I could use spark-shell with build-in Hive which config to use Metastore in Postgres



回答2:

You have two options:

  1. You can continue to use your own hive installation. You need to put a copy of hive-site.xml (or make a symlink) under $SPARK_HOME/conf/hive-site.xml
  2. If you want to use the built-in hive: you need to modify the $SPARK_HOME/hive-<version>/conf/hive-site.xml .
    Inside the hive-site.xml you need to modify the javax.jdo.option.* values. Along the lines of the following:

    <property>
     <name>hive.metastore.local</name>
     <value>true</value>
       </property>
       <property>
     <name>javax.jdo.option.ConnectionURL</name>
     <value>jdbc:postgresql://localhost:5432/hivedb</value>
    </property>
    <property>
       <name>javax.jdo.option.ConnectionDriverName</name>
       <value>org.postgresql.Driver</value>
     </property>
     <property>
       <name>javax.jdo.option.ConnectionUserName</name>
       <value>******</value>
     </property>
     <property>
       <name>javax.jdo.option.ConnectionPassword</name>
       <value>******</value>
     </property>