Private Unicode Character displays differently in

2019-08-03 03:51发布

问题:

So I created a unicode character privately using Private Character Editor on Windows 10. The character was saved with the code E000. I copied it from the Character Map and pasted into a text editor and it worked. However, when I paste it into the Python IDLE editor it changes to a different unicode character, even before running the program. I can't use u'unicode_string' or anything like that because my unicode character doesn't even work in the interpreter. I am new to programming.

My question is, how do I use my private unicode character in Python 3.4?

This is what I see on Notepad.

This is what I see on Python 3.4 interpreter.

回答1:

Python isn't really the interesting part of this, rather the shell or terminal is. In our case, Windows uses special code points to represent private character encodings. To get those, you need to get a hex dump of the character on a shell in Windows, then you can render the character in Python.

NOTE: Use Unicode points E021 or higher, since lower number code points are usually used for control, and it seems that the Windows shell that the python interpreter and IDLE use doesn't let you override those with private characters.

Demonstration

I tested your issue by generating a private character of my own. I will put an image of my test here since it wouldn't be rendered properly in text here on Stack Overflow.

Explanation

I used the Character Map program in Windows 10 to copy the symbol and paste it into my python environment. The environment may truncate it on the right since it is a wide character and the environment didn't seem to like that. (I moved the cursor around to get it to render full-width.)

Then I proceeded to get the hexdump of the code point by encoding the character using the default utf-8 encoding, which turned out to be \xee\x80\xa1 as a bytes object.

Next I printed the data as a string to show you a common error, and what would be printed if you attempted to print a string of those bytes.

Then, I printed b'\xee\x80\xa1', which is how you would actually use the symbol in your software.



回答2:

You can use the \u escape sequence in your Python source code, like so:

my_unicode_string = 'This is my character: \ue000'
print(my_unicode_stirng)