Is there a way to prevent a Hive table from being

2020-07-27 04:20发布

问题:

I am developing a batch job that loads data into Hive tables from HDFS files. The flow of data is as follows

  1. Read the file received in HDFS using an external Hive table
  2. INSERT OVERWRITE the final hive table from the external Hive table applying certain transformations
  3. Move the received file to Archive

This flow works fine if there is a file in the input directory for the external table to read during step 1. If there is no file, the external table will be empty and as a result executing step 2 will empty the final table. If the external table is empty, I would like to keep the existing data in the final table (the data loaded during the previous execution).

Is there a hive property that I can set so that the final table is overwritten only if we are overwriting it with some data?

I know that I can check if the input file exists using an HDFS command and conditionally launch the Hive requests. But I am wondering if I can achieve the same behavior directly in Hive which would help me avoid this extra verification

回答1:

Try to add dummy partition to your table, say LOAD_TAG and use dynamic partition load:

set hive.exec.dynamic.partition=true;
set hive.exec.dynamic.partition.mode=nonstrict;

INSERT OVERWRITE TABLE your_table PARTITION(LOAD_TAG)
select
      col1,
      ...
      colN,
      'dummy_value' as LOAD_TAG
  from source_table;

The partition value should always be the same in your case.