unicode and encoding for persian or arabic in pyth

2020-02-16 02:31发布

问题:

some chunk of code like this:

city_name = obj['city_from']['name'].encode('utf-8')
            print(city_name)

The output from this code is:

b'\xd8\xa8\xd9\x86\xd8\xaf\xd8\xb1\xd8\xb9\xd8\xa8\xd8\xa7\xd8\xb3'

and if i remove encode('utf-8') output change like this:

UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-7: ordinal not in range(128)

this output language is persian(like arabic), i wonder why the string class in python3 does not have any decode method? Do you have any solutions to this problem?

thanks

回答1:

Your answer shows that your terminal accepts utf-8 byte sequences.

You don't need to convert Unicode string into bytes before printing them. Python does it for you.

To change the character encoding that Python uses for I/O; set PYTHONIOENCODING=utf-8 environment variable or change your locale settings.

It looks like sys.stdout.encoding is ascii in your case.

$ python3 -c'import sys; print(sys.stdout.encoding)' 
UTF-8
$ python3 -c'import sys; print(sys.stdout.encoding)' | cat
ascii
$ LC_CTYPE=C python3 -c'import sys; print(sys.stdout.encoding)' 
ANSI_X3.4-1968

ANSI_X3.4-1968 is a canonical name for ascii.

$ PYTHONIOENCODING=uTf-8 python3 -c'import sys; print(sys.stdout.encoding)' | cat
uTf-8
$ LC_CTYPE=C.UTF-8 python3 -c'import sys; print(sys.stdout.encoding)' 
UTF-8

Don't hardcode the character encoding inside your scripts. Print Unicode strings and configure your environment appropriately instead



回答2:

okey i found my solution and it is working like a charm

import sys
sys.stdout.buffer.write(TestText2)

UPDATE: this problem for ZSH script environment, i use bash and everything is find.