Spark Sql - Insert Into External Hive Table Error

I am trying to insert data into a external hive table through spark sql. My hive table is bucketed via a column. The query to create the external hive table is this

create external table tab1 ( col1 type,col2 type,col3 type) clustered by (col1,col2) sorted by (col1) into 8 buckets stored as parquet

Now I tried to store data from a parquet file (stored in hdfs) into the table. This is my code

    SparkSession session = SparkSession.builder().appName("ParquetReadWrite").
                    config("hive.exec.dynamic.partition", "true").
                    config("hive.exec.dynamic.partition.mode", "nonstrict").
                    config("hive.execution.engine","tez").
                    config("hive.exec.max.dynamic.partitions","400").
                    config("hive.exec.max.dynamic.partitions.pernode","400").
                    config("hive.enforce.bucketing","true").
                    config("optimize.sort.dynamic.partitionining","true").
                    config("hive.vectorized.execution.enabled","true").
                    config("hive.enforce.sorting","true").
                    enableHiveSupport()
                    .master(args[0]).getOrCreate();
String insertSql="insert into tab1 select * from"+"'"+parquetInput+"'";

session.sql(insertSql);

When I run the code , its throwing the below error

mismatched input ''hdfs://url:port/user/clsadmin/somedata.parquet'' expecting (line 1, pos 50)

== SQL == insert into UK_DISTRICT_MONTH_DATA select * from 'hdfs://url:port/user/clsadmin/somedata.parquet' --------------------------------------------------^^^
```
at org.apache.spark.sql.catalyst.parser.ParseException.withCommand(ParseDriver.scala:239)
at org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parse(ParseDriver.scala:115)
at org.apache.spark.sql.execution.SparkSqlParser.parse(SparkSqlParser.scala:48)
```
What is the difference between using the hive execution engine as Tez and Spark ?

标签： apache-spark hive apache-spark-sql parquet

2条回答

何必那么认真

2楼-- · 2019-08-27 17:59

Creating external table in Hive, HDFS location to be specified.

create external table tab1 ( col1 type,col2 type,col3 type) 
clustered by (col1,col2) sorted by (col1) into 8 buckets 
stored as parquet 
LOCATION hdfs://url:port/user/clsadmin/tab1

There won't be necessity that hive will populate the data, either same application or other application can ingest the data into the location and hive will access the data by defining the schema top of the location.

*== SQL == insert into UK_DISTRICT_MONTH_DATA select * from 'hdfs://url:port/user/clsadmin/somedata.parquet' --------------------------------------------------^^^*

parquetInput is parquet HDFS file path and not Hive table name. Hence the error.

There are two ways you can solve this issue:

Define the external table for "parquetInput" and give the table name
Use LOAD DATA INPATH 'hdfs://url:port/user/clsadmin/somedata.parquet' INTO TABLE tab1

0人赞添加讨论(0) 举报

我命由我不由天

3楼-- · 2019-08-27 18:01

Have you tried

LOAD DATA LOCAL INPATH '/path/to/data'

OVERWRITE INTO TABLE tablename;

0人赞添加讨论(0) 举报

Spark Sql - Insert Into External Hive Table Error

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间