Escaping delimiter in Amazon Redshift COPY command

2019-07-17 06:12发布

问题:

I'm pulling data from Amazon S3 into a table in Amazon Redshift. The table contains various columns, where some column data might contain special characters.

The copy command has an option called Delimiter where we can specify the delimiter while pulling the data into the table.

The issue is 2 fold -

When I export (unload command) to S3 using a delimiter - say , - it works fine, but when I try to import into Redshift from S3, the issue creeps in because certain columns contain the ',' operator which the copy command misinterprets as delimiter and throws error.

I tried various delimiters, but the data in my table seems to contain some or other kind of special character which causes the above issue.

I even tried unloading using multiple delimiter - like #% or ~, but when loading from s3 using copy command - the dual delimiter is not supported.

Any solutions?

I think the delimiter can be escaped using \ but for some reason that isn't working either, or maybe I'm not using the right syntax for escaping in copy command.

回答1:

The following example shows the contents of a text file with the field values separated by commas.

12,Shows,Musicals,Musical theatre
13,Shows,Plays,All "non-musical" theatre  
14,Shows,Opera,All opera, light, and "rock" opera
15,Concerts,Classical,All symphony, concerto, and choir concerts

If you load the file using the DELIMITER parameter to specify comma-delimited input, the COPY command will fail because some input fields contain commas. You can avoid that problem by using the CSV parameter and enclosing the fields that contain commas in quote characters. If the quote character appears within a quoted string, you need to escape it by doubling the quote character. The default quote character is a double quotation mark, so you will need to escape each double quotation mark with an additional double quotation mark. Your new input file will look something like this.

12,Shows,Musicals,Musical theatre
13,Shows,Plays,"All ""non-musical"" theatre"
14,Shows,Opera,"All opera, light, and ""rock"" opera"
15,Concerts,Classical,"All symphony, concerto, and choir concerts"


Source :- Load Quote from a CSV File


What I use -

COPY tablename FROM 'S3-Path' CREDENTIALS '' MANIFEST CSV QUOTE '\"' DELIMITER ',' TRUNCATECOLUMNS ACCEPTINVCHARS MAXERROR 2

If I’ve made a bad assumption please comment and I’ll refocus my answer.



回答2:

If the delimiter is appearing within fields, then use the ADDQUOTES parameter with the UNLOAD command:

Places quotation marks around each unloaded data field, so that Amazon Redshift can unload data values that contain the delimiter itself.

Then:

If you use ADDQUOTES, you must specify REMOVEQUOTES in the COPY if you reload the data.

A popular delimiter is the pipe character (|) that is rare in text files.