handling newline character in hive

2019-01-26 20:21发布

I have created a table in hive as

Create table(id int, Description String)  

My data looks something as follows :

 
1|This will return corrupt data since there is a ',' in the first string.
     some text
     Change the data  
2|There is prob in reading data 
    sometext

After the data is loaded into hive since the default line terminator is \n, the description column cannot be read by hive, Hence it displays a NULL value. Can anyone suggest how to handle newline before loading into hive.

标签: hadoop hive
3条回答
Rolldiameter
2楼-- · 2019-01-26 20:42

I know this question is old, but you have a couple of options. You can't control this with FIELDS TERMINATED BY, because that only controls what terminates the fields, not the records. Records in Hive are hard-coded to be terminated by the newline character (even though there is a LINES TERMINATED BY clause, it is not implemented).

  1. Write a custom InputFormat that uses a RecordReader that understands non-newline delimited records. Look at the code for LineReader/LineRecordReader and TextInputFormat.
  2. Use a format other than text/ASCII, like Parquet. I would recommend this regardless, as text is probably the worst format you can store data in anyway.
查看更多
女痞
3楼-- · 2019-01-26 20:44

By default hive takes in NEWLINE ('\N') as delimiter . You can change the delimiter using:

    ROW FORMAT DELIMITED FIELDS TERMINATED BY ",";
查看更多
Viruses.
4楼-- · 2019-01-26 20:45

try adding the below property in hive-site.xml or you can just try for temporary hive session level.

hive.query.result.fileformat=SequenceFile

查看更多
登录 后发表回答