Add column in hive not allowed from scala/spark co

2019-07-29 14:41发布

问题:

I am trying to add a column in a Hive table if the source data has new columns. All the detection of new columns works well, however, when I try to add the column to the destination table, I receive this error:

for (f <- df.schema.fields) {
  if ("[" + f.name + "]"==chk) {
    spark.sqlContext.sql("alter table dbo_nwd_orders add columns (" + f.name + " " + f.dataType.typeName.replace("integer", "int") + ")")
  }
}

Error:

WARN HiveExternalCatalog: Could not alter schema of table  `default`.`dbo_nwd_orders` in a Hive compatible way. Updating Hive metastore in Spark SQL specific format
InvalidOperationException(message:partition keys can not be changed.)

However, if I catch the alter sentence generated and execute it from hive GUI (HUE), I can add it without issues.

alter table dbo_nwd_orders add columns (newCol int)

Why that sentence is valid from the GUI and not from spark code?

Thank you very much.

回答1:

It has been said multiple times here, but just to reiterate - Spark is not Hive interface and is not designed for full Hive compatibility in terms of language (Spark targets SQL standard, Hive uses custom SQL-like query language) or capabilities (Spark is ETL solution, Hive is a Data Warehousing solution).

Even data layouts are not fully compatible between these two.

Spark with Hive support is Spark with access to Hive metastore, not Spark that behaves like Hive.

If you need to access full set of Hive's features connect to Hive directly with native client or native (not Spark) JDBC connection, and use interact with it from there.