I have a django project that uses a sqlite database that can be written to by an external tool. The text is supposed to be UTF-8, but in some cases there will be errors in the encoding. The text is from an external source, so I cannot control the encoding. Yes, I know that I could write a "wrapping layer" between the external source and the database, but I prefer not having to do this, especially since the database already contains a lot of "bad" data.
The solution in sqlite is to change the text_factory to something like:
lambda x: unicode(x, "utf-8", "ignore")
However, I don't know how to tell the Django model driver this.
The exception I get is:
'Could not decode to UTF-8 column 'Text' with text'
in
/var/lib/python-support/python2.5/django/db/backends/sqlite3/base.py in execute
Somehow I need to tell the sqlite driver not to try to decode the text as UTF-8 (at least not using the standard algorithm, but it needs to use my fail-safe variant).
Feed the data with one of the magic str function from Django :
or
It seems, that this problem arises quite often and that it is of great interest to many people. (As this questions has more than a thousand views and quite some upvotes)
So here is the answer, that I found for the problem, which appears to me as the most convenient one:
I checked the django sqlite3 connector and added the str conversion directly to the
get_new_connection(...)
function:It seems to work as it should and one does not have to check on the unicode problem in every request individually. Shouldn't it be considered to add this to django code (?), since I wouldn't suggest anyone to actually modify his django backend code manually...
In my specific case I needed to set connection.connection.text_factory = str
Inspired by Milla's answer, consider the following monkey-patch that installs a more tolerant text_factory into the django sqlite connection. To be used when you cannot control how text is added to the sqlite database and it might not be in utf-8. Of course, the encoding used here may not be the right one, but at least your application won't crash.
Have you tried
before running any queries?