I am trying to load data into hive tables which is delimited by double pipe(||). When I try this :
Sample I/P:
1405983600000||111.111.82.41||806065581||session-id
Creating table in hive:
create table test_hive(k1 string, k2 string, k3 string, k4 string,) row format delimited fields terminated by '||' stored as textfile;
Loading data from text file:
load data local inpath '/Desktop/input.txt' into table test_hive;
When I do this it is storing data in the below format:
1405983600000 tabspace-as-second-column 111.111.82.41 tabspace-as-fourth-column
Where as I am expecting the data in table to be
1405983600000 111.111.82.41 806065581 session-id
Kindly help me out I have tried different options on this but unable to resolve it
Multicharater delimiter eg. || is not supported in Hive till ver 0.13 . So fields terminated by || won't work out.There is an alter native for this.
CREATE EXTERNAL TABLE page_view(viewTime INT, userid BIGINT,
page_url STRING, referrer_url STRING,
ip STRING COMMENT 'IP Address of the User',
country STRING COMMENT 'country of origination')
COMMENT 'This is the staging page view table'
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\054'
SERDE serde_name WITH SERDEPROPERTIES (field.delim='||')
STORED AS TEXTFILE
LOCATION '<hdfs_location>';
The default serde can be used. Multi character delimiters can be used for fields , line , escape characters by specifying them in the serde properties.
This issue has been resolved in hive 14 with the use of multidelimiter serde. Please find documentation here.
https://cwiki.apache.org/confluence/display/Hive/MultiDelimitSerDe
You could do this if you don't want to use alternate serde or have earlier version of hive:
create external table my_table (line string) location /path/file;
Then create view on top:
create view my_view as select split(line,'\\|\\|')[0] as column_1
, split(line,'\\|\\|')[1] as column_2
, split(line,'\\|\\|')[2] as column_3
from my_table;
Query the view. Good luck.