How to create SparkSession with Hive support (fail

2019-01-15 07:10发布

问题:

I'm getting this error when I'm trying to run this code.

import org.apache.spark.sql.Dataset;
import org.apache.spark.sql.Row;
import org.apache.spark.sql.SparkSession;
public class App 
{
    public static void main(String[] args) throws Exception {
        String warehouseLocation = "file:" + System.getProperty("user.dir") + "spark-warehouse";
        SparkSession spark = SparkSession
          .builder().master("local")
          .appName("Java Spark Hive Example")
          .config("spark.sql.warehouse.dir", warehouseLocation).enableHiveSupport()
          .getOrCreate();

        String path = "/home/cloudera/Downloads/NetBeansProjects/sparksql1/src/test/Employee.json";

        spark.sql("CREATE TABLE IF NOT EXISTS src (key INT, value STRING)");
        spark.sql("LOAD DATA LOCAL INPATH '"+path+"' INTO TABLE src");



        //load from HDFS

         Dataset<Row> df = spark.read().json(path);

         df.registerTempTable("temp_table");

         spark.sql("create table TEST.employee as select * from temp_table");

         df.printSchema();
         df.show();

        }
}

Output:

Exception in thread "main" java.lang.IllegalArgumentException: Unable to instantiate SparkSession with Hive support because Hive classes are not found. at org.apache.spark.sql.SparkSession$Builder.enableHiveSupport(SparkSession.scala:778) at com.training.hivetest.App.main(App.java:21)

How can it be resolved?

回答1:

Add following dependency to your maven project.

<dependency>
        <groupId>org.apache.spark</groupId>
        <artifactId>spark-hive_2.11</artifactId>
        <version>2.0.0</version>
</dependency>


回答2:

I've looked into the source code, and found that despite HiveSessionState(in spark-hive), another class HiveConf is also needed to initiate SparkSession. And HiveConf is not contained in spark-hive*jar, maybe you can find it in hive related jars and put it in your classpath.



回答3:

I had the same problem. I could resolve it by adding following dependencies. (I resolved this list by referring compile dependencies section of spark-hive_2.11 mvn repository page):

 <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-hive_${scala.binary.version}</artifactId>
            <version>${spark.version}</version>
        </dependency>
        <dependency>
            <groupId>org.apache.calcite</groupId>
            <artifactId>calcite-avatica</artifactId>
            <version>1.6.0</version>
        </dependency>
        <dependency>
            <groupId>org.apache.calcite</groupId>
            <artifactId>calcite-core</artifactId>
            <version>1.12.0</version>
        </dependency>
        <dependency>
            <groupId>org.spark-project.hive</groupId>
            <artifactId>hive-exec</artifactId>
            <version>1.2.1.spark2</version>
        </dependency>
        <dependency>
            <groupId>org.spark-project.hive</groupId>
            <artifactId>hive-metastore</artifactId>
            <version>1.2.1.spark2</version>
        </dependency>
        <dependency>
            <groupId>org.codehaus.jackson</groupId>
            <artifactId>jackson-mapper-asl</artifactId>
            <version>1.9.13</version>
        </dependency>

where scala.binary.version = 2.11 and spark.version = 2.1.0

 <properties>
      <scala.binary.version>2.11</scala.binary.version>
      <spark.version>2.1.0</spark.version>
    </properties>