I am trying to export a table to a local csv file in hive.
INSERT OVERWRITE LOCAL DIRECTORY '/home/sofia/temp.csv'
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
ESCAPED BY '\\'
LINES TERMINATED BY '\n'
select * from mytable;
The problem is that some of the values contain the newline "\n" character and the resulting file becomes really messy.
Is there any way of enclosing the values in quotes when exporting in Hive, so that the csv file can contain special characters (and especially the newline)?
One possible solution could be to use Hive CSV SerDe (Serializer/Deserializer)
. It provides a way to specify custom delimiters, quote, and escape characters
.
Limitation:
It does not handle embedded newlines
Availability:
The CSV Serde
is available in Hive 0.14 and greater.
Background:
The CSV SerDe
is based from https://github.com/ogrodnek/csv-serde, and was added to the Hive
distribution in HIVE-7777.
Usage:
This SerDe
works for most CSV data
, but does not handle embedded newlines
. To use the SerDe
, specify the fully qualified class name org.apache.hadoop.hive.serde2.OpenCSVSerde
.
original documentation is available at https://github.com/ogrodnek/csv-serde.
CREATE TABLE my_table(a string, b string, ...)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
WITH SERDEPROPERTIES (
"separatorChar" = "\t",
"quoteChar" = "'",
"escapeChar" = "\\"
)
STORED AS TEXTFILE;
Default separator, quote, and escape characters if unspecified
DEFAULT_ESCAPE_CHARACTER \
DEFAULT_QUOTE_CHARACTER "
DEFAULT_SEPARATOR ,
Reference: Hive csv-serde