Difference between u'string' and unicode(s

2019-05-13 18:28发布

This is a sample program i made:

>>> print u'\u1212'
ሒ
>>> print '\u1212'
\u1212
>>> print unicode('\u1212')
\u1212

why do i get \u1212 instead of when i print unicode('\u1212')?

I'm making a program to store data and not print it, so how do i store instead of \u1212? Now obviously i can't do something like:

x = u''+unicode('\u1212')

interestingly even if i do that, here's what i get:

\u1212

another fact that i think is worth mentioning :

>>> u'\u1212' == unicode('\u1212')
False

What do i do to store or some other character like that instead of \uxxxx?

2条回答
冷血范
2楼-- · 2019-05-13 19:03

This is just a misunderstanding.

This is a unicode string: x = u'\u1212'

When you call print x it is will print its character () as shown. If you just call x it will show the represntation of it:

u'\u1212'

All is well with the world.

This is an ascii string: y = "\u1212"

When you call print y it is will print its value (\u1212) as shown. If you just call x it will show the represntation of it:

'\\udfgdfg'

Notice the double slashes (\\) that indicate the slash is being escaped.

So, lets look at the following function call: print unicode('\u1212')

This is a function call, and we can replace the string with a variable, so we'll use the equivilent:

y = "\u1212"
print unicode(x)

But as in the second exacmple above, y is an ascii string that is being managed internally as '\udfgdfg', its not a unicode string at all. So the unicode representation of '\\udfgdfg' is exactly the same. Thus why its not behaving correctly.

查看更多
淡お忘
3楼-- · 2019-05-13 19:07

'\u1212' is an ASCII string with 6 characters: \, u, 1, 2, 1, and 2.

unicode('\u1212') is a Unicode string with 6 characters: \, u, 1, 2, 1, and 2

u'\u1212' is a Unicode string with one character: .

You should use Unicode strings all around, if that's what you want.

u'\u1212'

If for some reason you need to convert '\u1212' to u'\u1212', use

'\u1212'.decode('unicode-escape')

(Note that in Python 3, strings are always Unicode.)

查看更多
登录 后发表回答