I have created a table in hive as
Create table(id int, Description String)
My data looks something as follows :
1|This will return corrupt data since there is a ',' in the first string.
some text
Change the data
2|There is prob in reading data
sometext
After the data is loaded into hive since the default line terminator is \n, the description column cannot be read by hive, Hence it displays a NULL value. Can anyone suggest how to handle newline before loading into hive.
I know this question is old, but you have a couple of options. You can't control this with FIELDS TERMINATED BY
, because that only controls what terminates the fields, not the records. Records in Hive are hard-coded to be terminated by the newline character (even though there is a LINES TERMINATED BY
clause, it is not implemented).
- Write a custom
InputFormat
that uses a RecordReader
that
understands non-newline delimited records. Look at the code for
LineReader
/LineRecordReader
and TextInputFormat
.
- Use a format
other than text/ASCII, like Parquet. I would recommend this
regardless, as text is probably the worst format you can store data
in anyway.
try adding the below property in hive-site.xml or you can just try for temporary hive session level.
hive.query.result.fileformat=SequenceFile
By default hive takes in NEWLINE ('\N') as delimiter .
You can change the delimiter using:
ROW FORMAT DELIMITED FIELDS TERMINATED BY ",";