Unicode in python

2019-04-16 16:46发布

问题:

Now I use elixir with my mysql database and the redispy with redis and i select UTF-8 at all the place. I wanna to get some data writing in chinese like {'Info':‘8折’,'Name':'家乐福'} but what i got is like this:

{'Info': u'8\u6298', 'Name': u'\u5bb6\u4e50\u798f'}

and after i store this dict to redis and get it out by redispy it becomes:

{"Info": "8\u6298", "Name": "\u5bb6\u4e50\u798f"}

I know if i add u' before 8\u6298 and print it it will shou me "8折" but is there a function or another solution to this problem?

回答1:

The latter looks like json, try decoding it first:

import json

resp = '{"Info": "8\u6298", "Name": "\u5bb6\u4e50\u798f"}'
print json.loads(resp)

## {u'Info': u'8\u6298', u'Name': u'\u5bb6\u4e50\u798f'}


回答2:

You're just seeing the repr (representational) string. But it's the same unicode in the internal string.

Try this:

Python2> d = {'Info': u'8\u6298', 'Name': u'\u5bb6\u4e50\u798f'}
Python2> d
{'Info': u'8\u6298', 'Name': u'\u5bb6\u4e50\u798f'}
Python2> print d["Name"]
家乐福

Oh, but you are not getting that back. You have the non-unicode form back. One quick hack is to eval it.

import ast
ast.literal_eval(d["Name"])

But better would be to figure out why the system is not round-tripping the unicode.



回答3:

You add u' before 8\u6298, python store this value as an unicode instance, which has no encode format.

Before you put the data into redis, you have to encode your unicode instance, make it to be a real string.

You select UTF-8 at all the place, so just

>>> x=u'8\u6298'
>>> type(x)
<type 'unicode'>
>>> y=x.encode('utf8')
>>> type(y)
<type 'str'>
>>> y
'8\xe6\x8a\x98'
>>> print y
8折

Store y instead of x. The you read from database, the output will be a string '8\xe6\x8a\x98' (8折), not a python instance '8\u6298' any more.



回答4:

If you want the unicoded version of the string, take a look here