Hive How to extract data and write to local files

2019-06-06 03:24发布

问题:

I am trying to extract data from Hive table and write to local files:

One output file per a column "Date" value. My Hive table will have about 2+ years history of data, that means I will need about 700+ different output files.

My current knowledge will only allow me to write one file per a run, this is my code can be run in Hive command line:

INSERT OVERWRITE LOCAL DIRECTORY '/local/hive/temp'
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
select date, col1, col2, col3, col4, col5
from WH_TEMP_EXTRACT.table_temp
where date='2015-09-17';

I am not a developer, but currently in the process of researching all options to perform this task. I appreciate any help you can provide here.

回答1:

Extract all the 2 year data in a single query into the local file. After that you can use awk command to get them into individual files as below.

/tmp/hive> ls -l
total 4
-rw-r--r-- 1 xxxxxxx yyyyyy 228 Sep 20 10:11 hive_extract.dat
/tmp/hive> cat hive_extract.dat
2018-09-17,abc,134
2018-09-17,abc,135
2018-09-17,abc,136
2018-09-17,abc,137
2018-09-17,abc,138
2018-09-18,abc,141
2018-09-18,abc,142
2018-09-18,abc,143
2018-09-18,abc,144
2018-09-19,abc,150
2018-09-19,abc,151
2018-09-19,abc,152
/tmp/hive> awk -F"," '{ print $0 > "file_"$1 }' hive_extract.dat
/tmp/hive> ll
total 28
-rw-r--r-- 1 xxxxxxx yyyyyy 228 Sep 20 10:11 hive_extract.dat
-rw-r--r-- 1 xxxxxxx yyyyyy  57 Sep 20 10:13 file_2018-09-19
-rw-r--r-- 1 xxxxxxx yyyyyy  76 Sep 20 10:13 file_2018-09-18
-rw-r--r-- 1 xxxxxxx yyyyyy  95 Sep 20 10:13 file_2018-09-17
/tmp/hive> cat file_2018-09-17
2018-09-17,abc,134
2018-09-17,abc,135
2018-09-17,abc,136
2018-09-17,abc,137
2018-09-17,abc,138
/tmp/hive> cat file_2018-09-18
2018-09-18,abc,141
2018-09-18,abc,142
2018-09-18,abc,143
2018-09-18,abc,144
/tmp/hive> cat file_2018-09-19
2018-09-19,abc,150
2018-09-19,abc,151
2018-09-19,abc,152
/tmp/hive>

let me know if this solution will work for you.

EDIT 1: use gsub

awk -F"," '{ gsub("-","_",$1); print $0 > "file_"$1 }' hive_extract.dat

EDIT 2:

awk -F"," 'BEGIN { OFS=","} { gsub("-","_",$1); print $0 > "file_"$1 }' hive_extract.dat

EDIT 3:

awk -F"," '{ fx=$1;gsub("-","_",fx);print $0 > "file_"fx }' hive_extract.dat