How to handle fields enclosed within quotes(CSV) i

I am trying to use EMR/Hive to import data from S3 into DynamoDB. My CSV file has fields which are enclosed within double quotes and separated by comma. While creating external table in hive, I am able to specify delimiter as comma but how do I specify that fields are enclosed within quotes?

If I don’t specify, I see that values in DynamoDB are populated within two double quotes ““value”” which seems to be wrong.

I am using following command to create external table. Is there a way to specify that fields are enclosed within double quotes?

CREATE EXTERNAL TABLE emrS3_import_1(col1 string, col2 string, col3 string, col4 string)  ROW FORMAT DELIMITED FIELDS TERMINATED BY '","' LOCATION 's3://emrTest/folder';

Any suggestions would be appreciated. Thanks Jitendra

标签： amazon-web-services amazon-s3 hive amazon-dynamodb amazon-emr

7条回答

【Aperson】

2楼-- · 2019-02-02 13:56

Hive doesn't support quoted strings right out of the box. There are two approaches to solving this:

Use a different field separator (e.g. a pipe).
Write a custom InputFormat based on OpenCSV.

The faster (and arguably more sane) approach is to modify your initial the export process to use a different delimiter so you can avoid quoted strings. This way you can tell Hive to use an external table with a tab or pipe delimiter:

CREATE TABLE foo (
  col1 INT,
  col2 STRING
) ROW FORMAT DELIMITED FIELDS TERMINATED BY '|';

0人赞添加讨论(0) 举报

疯言疯语

3楼-- · 2019-02-02 13:57

Use the csv-serde-0.9.1.jar file in your hive query, see http://illyayalovyy.github.io/csv-serde/

add jar /path/to/jar_file

Create external table emrS3_import_1(col1 string, col2 string, col3 string, col4 string) row format serde 'com.bizo.hive.serde.csv.CSVSerde'
with serdeproperties
(
  "separatorChar" = "\;",
  "quoteChar" = "\"
) stored as textfile
tblproperties("skip.header.line.count"="1") ---to skip if have any header file
LOCATION 's3://emrTest/folder';

0人赞添加讨论(0) 举报

我欲成王，谁敢阻挡

4楼-- · 2019-02-02 14:04

Hive now includes an OpenCSVSerde which will properly parse those quoted fields without adding additional jars or error prone and slow regex.

ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde'

0人赞添加讨论(0) 举报

We Are One

5楼-- · 2019-02-02 14:06

If you're stuck with the CSV file format, you'll have to use a custom SerDe; and here's some work based on the opencsv libarary.

But, if you can modify the source files, you can either select a new delimiter so that the quoted fields aren't necessary (good luck), or rewrite to escape any embedded commas with a single escape character, e.g. '\', which can be specified within the ROW FORMAT with ESCAPED BY:

CREATE EXTERNAL TABLE emrS3_import_1(col1 string, col2 string, col3 string, col4 string)  ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' ESCAPED BY '\\' LOCATION 's3://emrTest/folder';

0人赞添加讨论(0) 举报

淡お忘

6楼-- · 2019-02-02 14:06

Following code solved same type of problem

CREATE TABLE TableRowCSV2(    
    CODE STRING,        
    PRODUCTCODE STRING, 
    PRICE STRING     
)
    COMMENT 'row data csv'    
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde'

WITH SERDEPROPERTIES (
   "separatorChar" = "\,",
   "quoteChar"     = "\""
)
STORED AS TEXTFILE
tblproperties("skip.header.line.count"="1");

0人赞添加讨论(0) 举报

▲ chillily

7楼-- · 2019-02-02 14:10

I was also stuck with the same issue as my fields are enclosed with double quotes and separated by semicolon(;). My table name is employee1.

So I have searched with links and I have found perfect solution for this.

We have to use serde for this. Please download serde jar using this link : https://github.com/downloads/IllyaYalovyy/csv-serde/csv-serde-0.9.1.jar

then follow below steps using hive prompt :

add jar path/to/csv-serde.jar;

create table employee1(id string, name string, addr string)
row format serde 'com.bizo.hive.serde.csv.CSVSerde'
with serdeproperties(
"separatorChar" = "\;",
"quoteChar" = "\"")
stored as textfile
;

and then load data from your given path using below query:

load data local inpath 'path/xyz.csv' into table employee1;

and then run :

select * from employee1;

Now you will see the magic. Thanks.

0人赞添加讨论(0) 举报

1 2 下一页

How to handle fields enclosed within quotes(CSV) i

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间