How to handle fields enclosed within quotes(CSV) i

2019-02-02 13:32发布

I am trying to use EMR/Hive to import data from S3 into DynamoDB. My CSV file has fields which are enclosed within double quotes and separated by comma. While creating external table in hive, I am able to specify delimiter as comma but how do I specify that fields are enclosed within quotes?

If I don’t specify, I see that values in DynamoDB are populated within two double quotes ““value”” which seems to be wrong.

I am using following command to create external table. Is there a way to specify that fields are enclosed within double quotes?

CREATE EXTERNAL TABLE emrS3_import_1(col1 string, col2 string, col3 string, col4 string)  ROW FORMAT DELIMITED FIELDS TERMINATED BY '","' LOCATION 's3://emrTest/folder';

Any suggestions would be appreciated. Thanks Jitendra

7条回答
手持菜刀,她持情操
2楼-- · 2019-02-02 14:16

There can be multiple solutions to this problem.

  1. Write custom SerDe class
  2. Use RegexSerde
  3. Remove escaped delimiter chars from data

Read more at http://grokbase.com/t/hive/user/117t2c6zhe/urgent-hive-not-respecting-escaped-delimiter-characters

查看更多
登录 后发表回答