Cannot Create table with spark SQL : Hive support

2019-08-16 02:32发布

问题:

I'm trying to create a table in spark (scala) and then insert values from two existing dataframes but I got this exeption:

Exception in thread "main" org.apache.spark.sql.AnalysisException: Hive support is required to CREATE Hive TABLE (AS SELECT);;
'CreateTable `stat_type_predicate_percentage`, ErrorIfExists 

Here is the code :

case class stat_type_predicate_percentage (type1: Option[String], predicate: Option[String], outin: Option[INT], percentage: Option[FLOAT])
object LoadFiles1 {

 def main(args: Array[String]) {
    val sc = new SparkContext("local[*]", "LoadFiles1") 
    val sqlContext = new org.apache.spark.sql.SQLContext(sc)
    val warehouseLocation = new File("spark-warehouse").getAbsolutePath
    val spark = SparkSession
        .builder()
        .appName("Spark Hive Example")
        .config("spark.sql.warehouse.dir", warehouseLocation)
        .enableHiveSupport()
        .getOrCreate()       

import sqlContext.implicits._    
import org.apache.spark.sql._       
import org.apache.spark.sql.Row;
import org.apache.spark.sql.types.{StructType,StructField,StringType};

//statistics 
val create = spark.sql("CREATE TABLE stat_type_predicate_percentage (type1 String, predicate String, outin INT, percentage FLOAT) USING hive")
val insert1 = spark.sql("INSERT INTO stat_type_predicate_percentage SELECT types.type, res.predicate, 0, 1.0*COUNT(subject)/(SELECT COUNT(subject) FROM MappingBasedProperties AS resinner WHERE res.predicate = resinner.predicate) FROM MappingBasedProperties AS res, MappingBasedTypes AS types WHERE res.subject = types.resource GROUP BY res.predicate,types.type")

val select = spark.sql("SELECT * from stat_type_predicate_percentage" ) 
  }

How should I solve it?

回答1:

This problem may be two fold for one you might want to do what @Tanjin suggested in the comments and it might work afterwards ( Try adding .config("spark.sql.catalogImplementation","hive") to your SparkSession.builder ) but if you actually want to use an existing hive instance with its own metadata which you'll be able to query from outside your job. Or you might already want to use existing tables you might like to add to you configuration the hive-site.xml.

This configuration file contains some properties you probably want like the hive.metastore.uris which will enable your context add a new table which will be save in the store. And it will be able to read from tables in your hive instance thanks to the metastore which contains tables and locations.