Create external table hive, location contains mult

2020-04-30 02:33发布

问题:

CREATE EXTERNAL TABLE IF NOT EXISTS LOGS (LGACT STRING,NTNAME STRING)  
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'  
LOCATION '/user/hive/warehouse/LOGS/test';

under 'test' folder I am writing files daily. for eg:

/user/hive/warehouse/LOGS/test/20170420
/user/hive/warehouse/LOGS/test/20170421
/user/hive/warehouse/LOGS/test/20170422

I cannot see any data inside LOGS table that i have created.

But, I create the table using

LOCATION '/user/hive/warehouse/LOGS/test/20170422';

I can see that days records.

I want to see all the data under /test directory in my HIVE table, also the /test directory is populated daily with new files.

回答1:

Option 1

In order to support sub-directories

set mapred.input.dir.recursive=true;

and if you Hive version is lower than 2.0.0 then also

set hive.mapred.supports.subdirectories=false;

Option 2

Create a partitioned table

CREATE EXTERNAL TABLE IF NOT EXISTS LOGS (LGACT STRING,NTNAME STRING)  
partitioned by (dt date)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'  
LOCATION '/user/hive/warehouse/LOGS/test';

alter table LOGS add if not exists partition (dt=date '2017-04-20') LOCATION '/user/hive/warehouse/LOGS/test/20170420';
alter table LOGS add if not exists partition (dt=date '2017-04-21') LOCATION '/user/hive/warehouse/LOGS/test/20170421';
alter table LOGS add if not exists partition (dt=date '2017-04-22') LOCATION '/user/hive/warehouse/LOGS/test/20170422';

It would be easier to manage if you keep your directories using the standard convention, e.g. dt=2017-04-20 instead of 20170420



回答2:

By default hive reads only the files (not directories) inside the specified location in external table. If you want to enable adding the directories, then set the below parameter:

set mapred.input.dir.recursive=true;