The file that I am loading is separated by ' ' (white space). Below is the file. The file resides in HDFS:-
001 000
001 000
002 001
003 002
004 003
005 004
006 005
007 006
008 007
099 007
1> I am creating an external table and loading the file by issuing the below command:-
CREATE EXTERNAL TABLE IF NOT EXISTS graph_edges (src_node_id STRING COMMENT 'Node ID of Source node', dest_node_id STRING COMMENT 'Node ID of Destination node') ROW FORMAT DELIMITED FIELDS TERMINATED BY ' ' STORED AS TEXTFILE LOCATION '/user/hadoop/input';
2> After this, I am simply inserting the table in another file by issuing the below command:-
INSERT OVERWRITE DIRECTORY '/user/hadoop/output' SELECT * FROM graph_edges;
3> Now, when I cat the file, the fields are not separated by any delimiter:-
hadoop dfs -cat /user/hadoop/output/000000_0
Output:-
001000
001000
002001
003002
004003
005004
006005
007006
008007
099007
Can somebody please help me out? Why is the delimiter being removed and how to delimit the output file?
In the CREATE TABLE command I tried DELIMITED BY '\t'
but then I am getting unnecessary NULL column.
Any pointers help much appreciated. I am using Hive 0.9.0 version.
You can provide delimiter when writing to directories
This should work for you.
I suspect that hive actually is writing a contol-A as the delimeter, but when you do a cat to the screen it is not showing up to your eye.
Instead try bringing up the file in vi or head the file if you only want to see a little of it, and vi the result:
hadoop dfs -cat /user/hadoop/output/000000_0 | head > my_local_file.txt
vi my_local_file.txt
You should be able to see the ^A characters in there.
This would be a better solution I suppose though its a round about way of achieving.
INSERT OVERWRITE DIRECTORY '/user/hadoop/output' SELECT src_node_id,' ',dest_node_id FROM graph_edges;
I have some different voice.
Indeed, Hive does not support custom delimiter.
But when you use
INSERT OVERWRITE DIRECTORY
, there are delimiters in your lines. The delimiter is'\1'
.You can use
hadoop dfs -cat $file | head -1 | xxd
to find it out or get the file from HDFS to local machine and open it with vim. There will be some char like '^A' in your vim which is the delimiter.Back to the question, You can use a simple way to solve it.
Still use
INSERT OVERWRITE DIRECTORY '/user/hadoop/output'
to generate/user/hadoop/output
;Create external table whose fields delimited by
'\1'
: