In my application I am using a postgresql database table with a "text" column to store pickled python objects. As database driver I'm using psycopg2 and until now I only passed python-strings (not unicode-objects) to the DB and retrieved strings from the DB. This basically worked fine until I recently decided to make String-handling the better/correct way and added the following construct to my DB-layer:
psycopg2.extensions.register_type(psycopg2.extensions.UNICODE)
psycopg2.extensions.register_type(psycopg2.extensions.UNICODEARRAY)
This basically works fine everywhere in my application and I'm using unicode-objects where possible now.
But for this special case with the text-column containing the pickled objects it makes troubles. I got it working in my test-system this way:
- retrieving the data:
SELECT data::bytea, params FROM mytable
- writing the data:
execute("UPDATE mytable SET data=%s", (psycopg2.Binary(cPickle.dumps(x)),) )
... but unfortunately I'm getting errors with the SELECT for some columns in the production-system:
psycopg2.DataError: invalid input syntax for type bytea
This error also happens when I try to run the query in the psql shell.
Basically I'm planning to convert the column from "text" to "bytea", but the error above also prevents me from doing this conversion.
As far as I can see, (when retrieving the column as pure python string) there are only characters with ord(c)<=127 in the string.
The problem is that casting
text
tobytea
doesn't mean, take the bytes in the string and assemble them as abytea
value, but instead take the string and interpret it as an escaped input value to thebytea
type. So that won't work, mainly because pickle data contains lots of backslashes, whichbytea
interprets specially.Try this instead:
This converts the string into a byte sequence (
bytea
value) in the LATIN1 encoding. For you, the exact encoding doesn't matter, because it's all ASCII (but there is noASCII
encoding).