I am trying to use EMR/Hive to import data from S3 into DynamoDB. My CSV file has fields which are enclosed within double quotes and separated by comma. While creating external table in hive, I am able to specify delimiter as comma but how do I specify that fields are enclosed within quotes?
If I don’t specify, I see that values in DynamoDB are populated within two double quotes ““value”” which seems to be wrong.
I am using following command to create external table. Is there a way to specify that fields are enclosed within double quotes?
CREATE EXTERNAL TABLE emrS3_import_1(col1 string, col2 string, col3 string, col4 string) ROW FORMAT DELIMITED FIELDS TERMINATED BY '","' LOCATION 's3://emrTest/folder';
Any suggestions would be appreciated. Thanks Jitendra
Hive doesn't support quoted strings right out of the box. There are two approaches to solving this:
The faster (and arguably more sane) approach is to modify your initial the export process to use a different delimiter so you can avoid quoted strings. This way you can tell Hive to use an external table with a tab or pipe delimiter:
Use the
csv-serde-0.9.1.jar
file in your hive query, see http://illyayalovyy.github.io/csv-serde/Hive now includes an
OpenCSVSerde
which will properly parse those quoted fields without adding additional jars or error prone and slow regex.ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
If you're stuck with the CSV file format, you'll have to use a custom SerDe; and here's some work based on the opencsv libarary.
But, if you can modify the source files, you can either select a new delimiter so that the quoted fields aren't necessary (good luck), or rewrite to escape any embedded commas with a single escape character, e.g. '\', which can be specified within the ROW FORMAT with ESCAPED BY:
Following code solved same type of problem
I was also stuck with the same issue as my fields are enclosed with double quotes and separated by semicolon(;). My table name is employee1.
So I have searched with links and I have found perfect solution for this.
We have to use serde for this. Please download serde jar using this link : https://github.com/downloads/IllyaYalovyy/csv-serde/csv-serde-0.9.1.jar
then follow below steps using hive prompt :
and then load data from your given path using below query:
and then run :
Now you will see the magic. Thanks.