I'm using mongodb and redis, redis is my cache.
I'm caching mongodb objects with redis-py:
obj in mongodb: {u'name': u'match', u'section_title': u'\u6d3b\u52a8', u'title':
u'\u6bd4\u8d5b', u'section_id': 1, u'_id': ObjectId('4fb1ed859b10ed2041000001'), u'id': 1}
the obj fetched from redis with hgetall(key, obj) is:
{'name': 'match', 'title': '\xe6\xaf\x94\xe8\xb5\x9b', 'section_title':
'\xe6\xb4\xbb\xe5\x8a\xa8', 'section_id': '1', '_id': '4fb1ed859b10ed2041000001', 'id': '1'}
As you can see, obj fetched from cache is str instead of unicode, so in my app, there is error s like :'ascii' codec can't decode byte 0xe6 in position 12: ordinal not in range(128)
Can anyone give some suggestions? thank u
Update, for global setting, check jmoz's answer.
If you're using third-party lib such as django-redis
, you may need to specify a customized ConnectionFactory
:
class DecodeConnectionFactory(redis_cache.pool.ConnectionFactory):
def get_connection(self, params):
params['decode_responses'] = True
return super(DecodeConnectionFactory, self).get_connection(self, params)
Assuming you're using redis-py, you'd better to pass str
instead of unicode
to Redis, or else Redis will encode it automatically for *set
commands, normally in UTF-8. For the *get
commands, Redis has no idea about the formal type of a value and has to just return the value in str
directly.
Thus, As Denis said, the way that you storing the object to Redis is critical. You need to transform the value to str
to make the Redis layer transparent for you.
Also, set the default encoding to UTF-8 instead of using ascii
I think I've discovered the problem. After reading this, I had to explicitly decode from redis which is a pain, but works.
I stumbled across a blog post where the author's output was all unicode strings which was obv different to mine.
Looking into the StrictRedis.__init__
there is a parameter decode_responses
which by default is False
. https://github.com/andymccurdy/redis-py/blob/273a47e299a499ed0053b8b90966dc2124504983/redis/client.py#L446
Pass in decode_responses=True
on construct and for me this FIXES THE OP'S ISSUE.
for each string you can use the decode
function to transform it in utf-8, e.g. for the value if the title field in your code:
In [7]: a='\xe6\xaf\x94\xe8\xb5\x9b'
In [8]: a.decode('utf8')
Out[8]: u'\u6bd4\u8d5b'
I suggest you always encode to utf-8 before writing to MongoDB or Redis (or any external system). And that you decode('utf-8') when you fecth results, so that you always work with Unicode in Python.