I'm getting this error when I'm trying to run this code.
import org.apache.spark.sql.Dataset;
import org.apache.spark.sql.Row;
import org.apache.spark.sql.SparkSession;
public class App
{
public static void main(String[] args) throws Exception {
String warehouseLocation = "file:" + System.getProperty("user.dir") + "spark-warehouse";
SparkSession spark = SparkSession
.builder().master("local")
.appName("Java Spark Hive Example")
.config("spark.sql.warehouse.dir", warehouseLocation).enableHiveSupport()
.getOrCreate();
String path = "/home/cloudera/Downloads/NetBeansProjects/sparksql1/src/test/Employee.json";
spark.sql("CREATE TABLE IF NOT EXISTS src (key INT, value STRING)");
spark.sql("LOAD DATA LOCAL INPATH '"+path+"' INTO TABLE src");
//load from HDFS
Dataset<Row> df = spark.read().json(path);
df.registerTempTable("temp_table");
spark.sql("create table TEST.employee as select * from temp_table");
df.printSchema();
df.show();
}
}
Output:
Exception in thread "main" java.lang.IllegalArgumentException: Unable to instantiate SparkSession with Hive support because Hive classes are not found.
at org.apache.spark.sql.SparkSession$Builder.enableHiveSupport(SparkSession.scala:778)
at com.training.hivetest.App.main(App.java:21)
How can it be resolved?
Add following dependency to your maven project.
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-hive_2.11</artifactId>
<version>2.0.0</version>
</dependency>
I've looked into the source code, and found that despite HiveSessionState(in spark-hive), another class HiveConf is also needed to initiate SparkSession. And HiveConf is not contained in spark-hive*jar, maybe you can find it in hive related jars and put it in your classpath.
I had the same problem. I could resolve it by adding following dependencies. (I resolved this list by referring compile dependencies section of spark-hive_2.11 mvn repository page):
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-hive_${scala.binary.version}</artifactId>
<version>${spark.version}</version>
</dependency>
<dependency>
<groupId>org.apache.calcite</groupId>
<artifactId>calcite-avatica</artifactId>
<version>1.6.0</version>
</dependency>
<dependency>
<groupId>org.apache.calcite</groupId>
<artifactId>calcite-core</artifactId>
<version>1.12.0</version>
</dependency>
<dependency>
<groupId>org.spark-project.hive</groupId>
<artifactId>hive-exec</artifactId>
<version>1.2.1.spark2</version>
</dependency>
<dependency>
<groupId>org.spark-project.hive</groupId>
<artifactId>hive-metastore</artifactId>
<version>1.2.1.spark2</version>
</dependency>
<dependency>
<groupId>org.codehaus.jackson</groupId>
<artifactId>jackson-mapper-asl</artifactId>
<version>1.9.13</version>
</dependency>
where scala.binary.version = 2.11 and spark.version = 2.1.0
<properties>
<scala.binary.version>2.11</scala.binary.version>
<spark.version>2.1.0</spark.version>
</properties>