How can I do a double delimiter(||) in Hive?

2019-09-09 22:17发布

问题:

I am trying to load data into hive tables which is delimited by double pipe(||). When I try this :

Sample I/P:

1405983600000||111.111.82.41||806065581||session-id

Creating table in hive:

create table test_hive(k1 string, k2 string, k3 string, k4 string,) row format delimited fields terminated by '||' stored as textfile;

Loading data from text file:

load data local inpath '/Desktop/input.txt' into table test_hive;

When I do this it is storing data in the below format:

1405983600000 tabspace-as-second-column 111.111.82.41 tabspace-as-fourth-column

Where as I am expecting the data in table to be

1405983600000 111.111.82.41 806065581 session-id

Kindly help me out I have tried different options on this but unable to resolve it

回答1:

Multicharater delimiter eg. || is not supported in Hive till ver 0.13 . So fields terminated by || won't work out.There is an alter native for this.

  CREATE EXTERNAL TABLE page_view(viewTime INT, userid BIGINT,
 page_url STRING, referrer_url STRING,
 ip STRING COMMENT 'IP Address of the User',
 country STRING COMMENT 'country of origination')
 COMMENT 'This is the staging page view table'
 ROW FORMAT DELIMITED FIELDS TERMINATED BY '\054'
 SERDE serde_name WITH SERDEPROPERTIES (field.delim='||')
 STORED AS TEXTFILE
 LOCATION '<hdfs_location>';

The default serde can be used. Multi character delimiters can be used for fields , line , escape characters by specifying them in the serde properties.



回答2:

This issue has been resolved in hive 14 with the use of multidelimiter serde. Please find documentation here. https://cwiki.apache.org/confluence/display/Hive/MultiDelimitSerDe



回答3:

You could do this if you don't want to use alternate serde or have earlier version of hive:

create external table my_table (line string) location /path/file;

Then create view on top:

create view my_view as select split(line,'\\|\\|')[0] as column_1 
, split(line,'\\|\\|')[1] as column_2
, split(line,'\\|\\|')[2] as column_3
from my_table;

Query the view. Good luck.



标签: hadoop hive