Proper way to print unicode characters to the cons

2019-04-11 15:06发布

I am looking for a way to print unicode characters to a UTF-8 aware Linux console, using Python 2.x's print method.

What I get is:

$ python2.7 -c "print u'é'"
é

What I want:

$ python2.7 -c "print u'é'"
é

Python detects correctly that the console is configured for UTF-8.

$ python2.7 -c "import sys; print sys.stdout.encoding"
UTF-8

I have looked at 11741574, but the proposed solution uses sys.stdout, whereas I am looking for a solution using print.

I have also looked at 5203105, but using the encode method does not fix anything.

$ python -c "print u'é'.encode('utf8')"
é

SOLUTIONS

As suggested by @KlausD. and @itzmeontv

$ python2.7 -c "print 'é'"
é

As suggested by @PM2Ring

$ python -c "# coding=utf-8
> print u'é'"
é

See the accepted answer for an explanation about the cause of the issue.

3条回答
Ridiculous、
2楼-- · 2019-04-11 15:39

The problem isn't printing to the console, the problem is interpreting the -c argument from the command line:

$ python -c "print repr('é')"
'\xc3\xa9' # OK, expected byte string
$ python -c "print repr('é'.decode('utf-8'))"
u'\xe9' # OK, byte string decoded explicitly
$ python -c "print repr(u'é')"
u'\xc3\xa9' # bad, decoded implicitly as iso-8859-1

Seems the problem is Python doesn't know what encoding command line arguments are using, so you get the same kind of problem as if a source code file had the wrong encoding. In that case you would tell Python what encoding the source used with a coding comment, and you can do that here too:

$ python -c "# coding=utf-8
print repr(u'é')"
u'\xe9'

Generally I'd try to avoid Unicode on the command line though, especially if you might ever have to run on Windows where the story is much worse.

查看更多
叼着烟拽天下
3楼-- · 2019-04-11 15:40

Try this if you want to print in console

python -c "print 'é'"

é
查看更多
迷人小祖宗
4楼-- · 2019-04-11 15:45

This is ugly, due to the problems mentioned by bobince.

But you can get what you want by telling Python that the character you're passing in from the console is actually encoded in iso-8859-1 aka latin-1.

$ python -c "s=u'é';print unicode(s.encode('iso-8859-1'), 'utf-8')"
é

$ python -c "s=u'é';print unicode(s.encode('latin-1'), 'utf-8')"
é
查看更多
登录 后发表回答