Is it possible to turn off quote processing in the

2019-01-23 17:28发布

I have CSV files, tab-separated, fields not wrapped in quotes, where field data can contain characters like single quotes, double quotes, pipes and backslashes.

Sample data can look like this:

1       2       "ba$aR\eR\       18

I want to import this data into Postgres using the COPY statement.

When I try to import this using

COPY <tablename> FROM  <filename> NULL AS '';

I get an error psql:-:1: ERROR: missing data for column because Postgres is treating the backslash + tab as an "escaped tab" instead of a backslash followed by the field separator.

So I switched to using the "CSV format" of the COPY operator, like so:

COPY <tablename> FROM <filename> WITH CSV DELIMITER E'\t' NULL AS '';

Now there's a new error psql:-:1: ERROR: value too long for type character varying(254)

Apparently because it's interpreting the double-quote at the start of field 3 as the field wrapping character.

How can I specify that my data is NOT quoted at all?

3条回答
放我归山
2楼-- · 2019-01-23 18:14

(Added as a new answer since I don't have the reputation yet to comment.)

For the record, since I've been struggling with the same issue, you can use tr to remove \b, instead of just hoping it's not in your text anywhere.

tr -d '\010' < filename.csv > newfile.csv

(Using that \010 is the octal representation of \b).

Since COPY supports reading from STDIN, you can ease the I/O impact by piping tr's output:

cat filename.csv | tr -d '\010' | COPY <tablename> FROM STDIN WITH CSV DELIMITER E'\t' QUOTE E'\b' NULL AS '';
查看更多
仙女界的扛把子
3楼-- · 2019-01-23 18:19

Workaround (thanks to this comment!)

COPY <tablename> FROM <filename> WITH CSV DELIMITER E'\t' QUOTE E'\b' NULL AS '';

So basically specifying a quote character that should never be in the text, but that's pretty ugly.

I'd much prefer it if there was in fact a way to turn off quote processing altogether.

查看更多
迷人小祖宗
4楼-- · 2019-01-23 18:23

The mode you want to use for data formatted as you describe is the default text mode. It will pass most characters unhindered into the database. It does not have quote processing, and it's using tabs as delimiters. Using CSV mode will just cause you trouble because you're introducing quoting that you have to work around.

Text mode will pass dollar characters, single and double quotes, pipes, and even backspaces (even though that was not mentioned in the question) right in. The one thing in the example that's not passed through is backslashes. But that's as simple as escaping them, for example by this sed command:

sed -e 's/\\/\\\\/g' < source.txt > processed.txt

Then the processed file should be importable without any additional options:

\copy sometable from processed.txt
查看更多
登录 后发表回答