I have a SQL query that I execute like this with an SQLAlchemy engine:
result = engine.execute('SELECT utf_8_field FROM table')
The database is MySQL and the column type is TEXT with UTF-8 encoding. The type of the returned utf_8_field is "str", even if I set the option convert_unicode=True when creating the engine. What happens now is that if I have a character like 'é' in my string (which is not in 7-bit ASCII, but is in the extended ASCII set), I get a UnicodeDecodeError when trying to execute this:
utf_8_field.encode("utf-8")
The exact error is:
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe9 in position 1: ordinal not in range(128)
When looking into this, I found that str.encode do not support the extended ASCII character set! I find this really strange, but that's another question.
What I don't understand is why SQLAlchemy is not giving me a unicode string. I was previously using DB-API and that was working fine. I also don't have SQLAlchemy table objects for my tables yet, that's why I'm using an execute command.
Any idea?
To convert from an UTF-8 bytestring to a unicode object, you need to decode:
Also, when executing a raw
SELECT
through.execute
, SQLAlchemy has no metadata to work out that your query is returning utf-8 data, so it is not converting this information to unicode for you.In other words,
convert_unicode
only works if you use the SQLAlchemy SQL expression API or the ORM functionality.EDIT: As pointed out, your data is not even UTF-8 encoded;
0xe9
in UTF-8 would indicate a character between\u9000
and\u9fff
, which are CJK unified ideographs while you said it was a latin-1 character, whose UTF-8 code would start with0xc3
. This is probablyISO-8859-1
(latin-1) or similar instead:The conclusion then is to tell SQLAlchemy to connect with a different character set, using the
charset=utf8
parameter, as pointed out by @mata.If you want the data converted automatically, you should specify the charset when you create the engine:
Setting
use_unicode
alone won't tell sqlalchemy which charset to use.