This is a sample program i made:
>>> print u'\u1212'
ሒ
>>> print '\u1212'
\u1212
>>> print unicode('\u1212')
\u1212
why do i get \u1212
instead of ሒ
when i print unicode('\u1212')
?
I'm making a program to store data and not print it, so how do i store ሒ
instead of \u1212
? Now obviously i can't do something like:
x = u''+unicode('\u1212')
interestingly even if i do that, here's what i get:
\u1212
another fact that i think is worth mentioning :
>>> u'\u1212' == unicode('\u1212')
False
What do i do to store ሒ
or some other character like that instead of \uxxxx
?
This is just a misunderstanding.
This is a unicode string:
x = u'\u1212'
When you call
print x
it is will print its character (ሒ
) as shown. If you just call x it will show therepr
esntation of it:All is well with the world.
This is an ascii string:
y = "\u1212"
When you call
print y
it is will print its value (\u1212
) as shown. If you just call x it will show therepr
esntation of it:Notice the double slashes (
\\
) that indicate the slash is being escaped.So, lets look at the following function call:
print unicode('\u1212')
This is a function call, and we can replace the string with a variable, so we'll use the equivilent:
But as in the second exacmple above,
y
is an ascii string that is being managed internally as '\udfgdfg', its not a unicode string at all. So the unicode representation of'\\udfgdfg'
is exactly the same. Thus why its not behaving correctly.'\u1212'
is an ASCII string with 6 characters:\
,u
,1
,2
,1
, and2
.unicode('\u1212')
is a Unicode string with 6 characters:\
,u
,1
,2
,1
, and2
u'\u1212'
is a Unicode string with one character:ሒ
.You should use Unicode strings all around, if that's what you want.
If for some reason you need to convert
'\u1212'
tou'\u1212'
, use(Note that in Python 3, strings are always Unicode.)