Inserting valid json with copy into postgres table

2020-03-12 04:35发布

问题:

Valid JSON can naturally have the backslash character: \. When you insert data in a SQL statement like so:

sidharth=# create temp table foo(data json);
CREATE TABLE
sidharth=# insert into foo values( '{"foo":"bar", "bam": "{\"mary\": \"had a lamb\"}" }');
INSERT 0 1

sidharth=# select * from foo;

data                         
\-----------------------------------------------------

{"foo":"bar", "bam": "{\"mary\": \"had a lamb\"}" }
(1 row)

Things work fine.

But if I copy the JSON to a file and run the copy command I get:

sidharth=# \copy foo from './tests/foo' (format text); 


ERROR:  invalid input syntax for type json
DETAIL:  Token "mary" is invalid.
CONTEXT:  JSON data, line 1: {"foo":"bar", "bam": "{"mary...
COPY foo, line 1, column data: "{"foo":"bar", "bam": "{"mary": "had a lamb"}" }"

Seems like postgres is not processing the backslashes. I think because of http://www.postgresql.org/docs/8.3/interactive/sql-syntax-lexical.html and it I am forced to use double backslash. And that works, i.e. when the file contents are:

{"foo":"bar", "bam": "{\\"mary\\": \\"had a lamb\\"}" }  

The copy command works. But is it correct to expect special treatment for json data types because afterall above is not a valid json.

回答1:

http://adpgtech.blogspot.ru/2014/09/importing-json-data.html

copy the_table(jsonfield) 
from '/path/to/jsondata' 
csv quote e'\x01' delimiter e'\x02';


回答2:

PostgreSQL's default bulk load format, text, is a tab separated markup. It requires backslashes to be escaped because they have special meaning for (e.g.) the \N null placeholder.

Observe what PostgreSQL generates:

regress=> COPY foo TO stdout;
{"foo":"bar", "bam": "{\\"mary\\": \\"had a lamb\\"}" }

This isn't a special case for json at all, it's true of any string. Consider, for example, that a string - including json - might contain embedded tabs. Those must be escaped to prevent them from being seen as another field.

You'll need to generate your input data properly escaped. Rather than trying to use the PostgreSQL specific text format, it'll generally be easier to use format csv and use a tool that writes correct CSV, with the escaping done for you on writing.