CTAS with Dynamic Partition

2019-08-05 20:08发布

问题:

I want to change an existing table, that contains text format, into orc format. I was able to do it by: (1) creating a table in orc format manually having the partitions and then, (2) using the INSERT OVERWRITE statement to populate the table.

I am trying to use CTAS (Create Table... AS Select...) statement for this. Is there any way I can include the dynamic partitioning with CTAS statement? So, if my text data set has multiple partitions (for example: year and month), can I point this in CTAS statement directly?

The format might be something like this:

CREATE TABLE TEST_TABLE
STORED AS ORC
WITH PARTITION(year, month)
LOCATION '/<my_location>'
tblproperties ("orc.compress"="SNAPPY")
AS SELECT * FROM <existing_table>;

Any idea please?

Note: The reason I'm interested for CTAS statement is as: using this statement, I don't really need to point out each and every columns name. But if I create an orc based table and then populate it manually, I had to indicate all the columns while creating the table. This is okay; but, not a good idea if my existing table contains a lot of columns.

回答1:

Not supported

hive> create table t partitioned by (p int) as select 1 as i;

FAILED: SemanticException [Error 10068]:
CREATE-TABLE-AS-SELECT does not support partitioning in the target table



回答2:

Two Steps:

CREATE TABLE target_table_name LIKE source_table_name;

INSERT OVERWRITE TABLE target_table_name PARTITION(partition_column_name)
SELECT * FROM source_table_name;

Rest stored as ORC, location you can add as per convenience and support by hive query.

For external tables, the steps do not work. The issue comes is, data is updated in pointed directory on HDFS but we cant see changes reflected in table when we do select table. For this you may follow following steps:

Solution 1: Check tables metadata if its updated or not.

Solution 2: If solution 1 does not work, then:

CREATE TABLE target_table_name LIKE source_table_name;

INSERT OVERWRITE TABLE target_table_name PARTITION(partition_column_name) 
  SELECT * FROM source_table_name;

CREATE EXTERNAL TABLE another_table_name LIKE source_table_name 
  STORED AS file_format_of_source_table 
  LOCATION 'location_of_source_table';

Fix the table

 MSCK REPAIR TABLE another_table;

Then you can drop source_table and rename another_table to source_table_name.