I am trying to insert data into a external hive table through spark sql. My hive table is bucketed via a column. The query to create the external hive table is this
create external table tab1 ( col1 type,col2 type,col3 type) clustered by (col1,col2) sorted by (col1) into 8 buckets stored as parquet
Now I tried to store data from a parquet file (stored in hdfs) into the table. This is my code
SparkSession session = SparkSession.builder().appName("ParquetReadWrite").
config("hive.exec.dynamic.partition", "true").
config("hive.exec.dynamic.partition.mode", "nonstrict").
config("hive.execution.engine","tez").
config("hive.exec.max.dynamic.partitions","400").
config("hive.exec.max.dynamic.partitions.pernode","400").
config("hive.enforce.bucketing","true").
config("optimize.sort.dynamic.partitionining","true").
config("hive.vectorized.execution.enabled","true").
config("hive.enforce.sorting","true").
enableHiveSupport()
.master(args[0]).getOrCreate();
String insertSql="insert into tab1 select * from"+"'"+parquetInput+"'";
session.sql(insertSql);
When I run the code , its throwing the below error
mismatched input ''hdfs://url:port/user/clsadmin/somedata.parquet'' expecting (line 1, pos 50)
== SQL == insert into UK_DISTRICT_MONTH_DATA select * from 'hdfs://url:port/user/clsadmin/somedata.parquet' --------------------------------------------------^^^
at org.apache.spark.sql.catalyst.parser.ParseException.withCommand(ParseDriver.scala:239) at org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parse(ParseDriver.scala:115) at org.apache.spark.sql.execution.SparkSqlParser.parse(SparkSqlParser.scala:48)
What is the difference between using the hive execution engine as Tez and Spark ?