I have CSV files, tab-separated, fields not wrapped in quotes, where field data can contain characters like single quotes, double quotes, pipes and backslashes.
Sample data can look like this:
1 2 "ba$aR\eR\ 18
I want to import this data into Postgres using the COPY statement.
When I try to import this using
COPY <tablename> FROM <filename> NULL AS '';
I get an error psql:-:1: ERROR: missing data for column
because Postgres is treating the backslash + tab as an "escaped tab" instead of a backslash followed by the field separator.
So I switched to using the "CSV format" of the COPY operator, like so:
COPY <tablename> FROM <filename> WITH CSV DELIMITER E'\t' NULL AS '';
Now there's a new error psql:-:1: ERROR: value too long for type character varying(254)
Apparently because it's interpreting the double-quote at the start of field 3 as the field wrapping character.
How can I specify that my data is NOT quoted at all?
(Added as a new answer since I don't have the reputation yet to comment.)
For the record, since I've been struggling with the same issue, you can use
tr
to remove\b
, instead of just hoping it's not in your text anywhere.(Using that
\010
is the octal representation of\b
).Since
COPY
supports reading fromSTDIN
, you can ease the I/O impact by pipingtr
's output:Workaround (thanks to this comment!)
So basically specifying a quote character that should never be in the text, but that's pretty ugly.
I'd much prefer it if there was in fact a way to turn off quote processing altogether.
The mode you want to use for data formatted as you describe is the default text mode. It will pass most characters unhindered into the database. It does not have quote processing, and it's using tabs as delimiters. Using CSV mode will just cause you trouble because you're introducing quoting that you have to work around.
Text mode will pass dollar characters, single and double quotes, pipes, and even backspaces (even though that was not mentioned in the question) right in. The one thing in the example that's not passed through is backslashes. But that's as simple as escaping them, for example by this
sed
command:Then the processed file should be importable without any additional options: