how to do partitions on subdirectories in hive

2019-03-06 12:23发布

问题:

I have directory structure like below in my hadoop,

`/hadoop/maindirec/subdirect1/file1
 /hadoop/maindirec/subdirect1/file2
 /hadoop/maindirec/subdirect2/file1 
 /hadoop/maindirec/subdirect2/file2
 /hadoop/maindirec/subdirect3/file1 
 /hadoop/maindirec/subdirect3/file2
 /hadoop/maindirec/subdirect4/file1    
 /hadoop/maindirec/subdirect4/file2
 /hadoop/maindirec/subdirect5/file1
 /hadoop/maindirec/subdirect5/file2`

Now i want to create hive table with orc format as maindirec and subdirect1-5 as partiations. could anyone please let me know how it can be done. Thanks in advance.

so far

create external table temp(name string,id int) partitioned by(subd string) row format delimited fields terminated by '\t' stored as orc location '/hadoop/maindirec' tblproperties("orc.compress"="SNAPPY","skip.header.line.count"="4");
alter table temp add partition(subd='subdirect1') location '/hadoop/maindirec/subdirect1' partition(subd='subdirect2') location '/hadoop/maindirec/subdirect2' partition(subd='subdirect3') location '/hadoop/maindirec/subdirect3' partition(subd='subdirect4') location '/hadoop/maindirec/subdirect4' partition(subd='subdirect5') location '/hadoop/maindirec/subdirect5';

Input
select * from temp;
Output
Failed with exception java.io.IOException:java.lang.RuntimeException: serious problem

回答1:

You can use this code:(change and add column names as per your need)

CREATE EXTERNAL TABLE temp_table ( col1 int,col2 int) PARTITIONED BY ( subd string) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LINES TERMINATED BY '\n' STORED AS ORC LOCATION '/hadoop/maindirec';

ALTER TABLE temp_table ADD PARTITION (subd='subdirect1') LOCATION '/hadoop/maindirec/subdirect1/files1-100'
PARTITION (subd='subdirect2') LOCATION '/hadoop/maindirec/subdirect2/files1-100'
PARTITION (subd='subdirect3') LOCATION '/hadoop/maindirec/subdirect3/files1-100'
PARTITION (subd='subdirect4') LOCATION '/hadoop/maindirec/subdirect4/files1-100'
PARTITION (subd='subdirect5') LOCATION '/hadoop/maindirec/subdirect5/files1-100';



标签: hadoop hive