Linux/Python: encoding a unicode string for print

2020-02-09 06:47发布

I have a fairly large python 2.6 application with lots of print statements sprinkled about. I'm using unicode strings throughout, and it usually works great. However, if I redirect the output of the application (like "myapp.py >output.txt"), then I occasionally get errors such as this:

UnicodeEncodeError: 'ascii' codec can't encode character u'\xa1' in position 0: ordinal not in range(128)

I guess the same issue comes up if someone has set their LOCALE to ASCII. Now, I understand perfectly well the reason for this error. There are characters in my Unicode strings that are not possible to encode in ASCII. Fair enough. But I'd like my python program to make a best effort to try to print something understandable, maybe skipping the suspicious characters or replacing them with their Unicode ids.

This problem must be common... What is the best practice for handling this problem? I'd prefer a solution that allows me to keep using plain old "print", but I can modify all occurrences if necessary.

PS: I have now solved this problem. The solution was neither of the answers given. I used the method given at http://wiki.python.org/moin/PrintFails , as given by ChrisJ in one of the comments. That is, I replace sys.stdout with a wrapper that calls unicode encode with the correct arguments. Works very well.

3条回答
倾城 Initia
2楼-- · 2020-02-09 07:16

If you're dumping to an ASCII terminal, encode manually using unicode.encode, and specify that errors should be ignored.

u = u'\xa0'
u.encode('ascii') # This fails
u.encode('ascii', 'ignore') # This replaces failed encoding attempts with empty string

If you want to store unicode files, try this:

u = u'\xa0'
print >>open('out', 'w'), u # This fails
print >>open('out', 'w'), u.encode('utf-8') # This is ok
查看更多
霸刀☆藐视天下
3楼-- · 2020-02-09 07:27

Either wrap all your print statement through a method perform arbitrary unicode -> utf8 conversion or as last resort change the Python default encoding from ascii to utf-8 inside your site.py. In general it is a bad idea printing unicode strings unfiltered to sys.stdout since Python will trigger an implict conversion of unicode strings to the configured default encoding which is ascii.

查看更多
迷人小祖宗
4楼-- · 2020-02-09 07:33

I have now solved this problem. The solution was neither of the answers given. I used the method given at http://wiki.python.org/moin/PrintFails , as given by ChrisJ in one of the comments. That is, I replace sys.stdout with a wrapper that calls unicode encode with the correct arguments. Works very well.

查看更多
登录 后发表回答