Hive INSERT OVERWRITE DIRECTORY command output is

The file that I am loading is separated by ' ' (white space). Below is the file. The file resides in HDFS:-

1> I am creating an external table and loading the file by issuing the below command:-

CREATE EXTERNAL TABLE IF NOT EXISTS graph_edges (src_node_id STRING COMMENT 'Node ID of Source node', dest_node_id STRING COMMENT 'Node ID of Destination node') ROW FORMAT DELIMITED FIELDS TERMINATED BY ' ' STORED AS TEXTFILE LOCATION '/user/hadoop/input';

2> After this, I am simply inserting the table in another file by issuing the below command:-

INSERT OVERWRITE DIRECTORY '/user/hadoop/output' SELECT * FROM graph_edges;

3> Now, when I cat the file, the fields are not separated by any delimiter:-

hadoop dfs -cat /user/hadoop/output/000000_0

Output:-

Can somebody please help me out? Why is the delimiter being removed and how to delimit the output file?

In the CREATE TABLE command I tried DELIMITED BY '\t' but then I am getting unnecessary NULL column.

Any pointers help much appreciated. I am using Hive 0.9.0 version.

标签： hadoop hive

10条回答

该账号已被封号

2楼-- · 2019-03-10 14:39

The default separator is "^A". In python language, it is "\x01".

When I want to change the delimiter, I use SQL like:

SELECT col1, delimiter, col2, delimiter, col3, ..., FROM table

Then, regard delimiter+"^A" as a new delimiter.

0人赞添加讨论(0) 举报

The star\"

3楼-- · 2019-03-10 14:40

The problem is that HIVE does not allow you to specify the output delimiter - https://issues.apache.org/jira/browse/HIVE-634

The solution is to create external table for output (with delimiter specification) and insert overwrite table instead of directory.

Assuming that you have /user/hadoop/input/graph_edges.csv in HDFS,

hive> create external table graph_edges (src string, dest string) 
    > row format delimited 
    > fields terminated by ' ' 
    > lines terminated by '\n' 
    > stored as textfile location '/user/hadoop/input';

hive> select * from graph_edges;
OK
001 000
001 000
002 001
003 002
004 003
005 004
006 005
007 006
008 007
099 007

hive> create external table graph_out (src string, dest string) 
    > row format delimited 
    > fields terminated by ' ' 
    > lines terminated by '\n' 
    > stored as textfile location '/user/hadoop/output';

hive> insert into table graph_out select * from graph_edges;
hive> select * from graph_out;
OK
001 000
001 000
002 001
003 002
004 003
005 004
006 005
007 006
008 007
099 007

[user@box] hadoop fs -get /user/hadoop/output/000000_0 .

Comes back as above, with spaces.

0人赞添加讨论(0) 举报

唯我独甜

4楼-- · 2019-03-10 14:44

I think using the concat_ws function you can achieve your output;

INSERT OVERWRITE DIRECTORY '/user/hadoop/output' SELECT concat_ws(',', col1, col2) FROM graph_edges;

here i have chosen comma as the column delimiter

0人赞添加讨论(0) 举报

Ridiculous、

5楼-- · 2019-03-10 14:46

I had this issue where the output of the hive query results should be pipe delimited.. Running this sed command you can replace: ^A to |

sed 's#\x01#|#g' test.log > piped_test.log

0人赞添加讨论(0) 举报

Summer. ? 凉城

6楼-- · 2019-03-10 14:48

While the question is over 2 years old and the top answer was correct at the time, it is now possible to tell Hive to write delimited data to a directory.

Here is an example of outputting the data with the traditional ^A separator:

INSERT OVERWRITE DIRECTORY '/output/data_delimited'
SELECT *
FROM data_schema.data_table

And now with tab delimiters:

INSERT OVERWRITE DIRECTORY '/output/data_delimited'
row format delimited 
FIELDS TERMINATED BY '\t'
SELECT *
FROM data_schema.data_table

0人赞添加讨论(0) 举报

兄弟一词,经得起流年.

7楼-- · 2019-03-10 14:52

you can use this parameter "row format delimited fields terminated by '|'" for example in your case should be

INSERT OVERWRITE DIRECTORY '/user/hadoop/output' row format delimited fields terminated by '|' SELECT * FROM graph_edges;

0人赞添加讨论(0) 举报

1 2 下一页

Hive INSERT OVERWRITE DIRECTORY command output is

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间