Suppress the u'prefix indicating unicode'

2019-01-01 07:14发布

Is there a way to globally suppress the unicode string indicator in python? I'm working exclusively with unicode in an application, and do a lot of interactive stuff. Having the u'prefix' show up in all of my debug output is unnecessary and obnoxious. Can it be turned off?

11条回答
时光乱了年华
2楼-- · 2019-01-01 07:43

Not sure with unicode, but generally you can call str.encode() to convert it to a more suitable form. For instance, subprocess output captured in Python 3.0+ captures it as a byte stream (prefix 'b'), and encode() fixes to a regular string form.

查看更多
呛了眼睛熬了心
3楼-- · 2019-01-01 07:47

Try the following

print str(result.url)

It could be that your default encoding has been changed.

You can check your default encoding with the following:-

> import sys
> print sys.getdefaultencoding()
> ascii

The default should be ascii which means u'string' should be printed as 'string' but yours may have been modified.

查看更多
像晚风撩人
4楼-- · 2019-01-01 07:50

In the case that you do not want to update to Python 3, you could make use of substrings. For example, say the original output was (u'mystring',). Let us assume for the sake of the example that the variable row contains the "mystring" string without the unicode prefix. Then you would want to do something like this:

temp = str(row); #str is not necessary, but probably good practice
temp = temp[:-3];
print = temp[3:];
查看更多
永恒的永恒
5楼-- · 2019-01-01 07:52

using str( text ) is a somewhat bad idea in fact whenever you cannot be 100% sure about both your python's default encoding and the exact content of the string---the latter would be typical for a text fetched from the internet. also, depending on what you want to do, using print text.encode( 'utf-8' ) or print repr( text.encode( 'utf-8' ) ) may yield disappointing results, as you might get a rendering full of unreadable codepoints like \x3a.

i think the optimum is really to avail yourself of a unicode-capable command line (difficult under windows, easy under linux) and switch from python 2.x to python 3.x. the ease and clarity of text vs bytes handling afforded by the new python 3 series is really one of the big gains you can expect. it does mean you'll have to spend a little time learning the distinction between 'bytes' and 'text' and grasp the concept of character encodings, but then that time is much better spent in a python 3 environment as python's new approch to these vexing problems is much clearer and much less error-prone than what python 2 had to offer. i'd go so far as to call python 2's approach to unicode problematic in retrospect, although i used to think of it as superior---when i compared it to the way this issue is handled in php.

edit i just stopped by a related discussion here on SO and found this comment on the way that php these days appears to tackle unicode / encoding issues:

It's like a mouse trying to eat an elephant. By framing Unicode as an extension of ASCII (we have normal strings and we have mb_strings) it gets things the wrong way around, and gets hung up on what special cases are required to deal with characters with funny squiggles that need more than one byte. If you treat Unicode as providing an abstract space for any character you need, ASCII is accommodated in that without any need to treat it as a special case.

i quote this here because in my experience 90% of all SO python+unicode topics seem to come from people who used to be fine with ascii or maybe latin-1, got bitten by the occasional character that was not supported in their usual settings, and then basically just want to get rid of it. what you do when switching to python 3 is exactly what the commenter above suggests to do: instead of viewing unicode as a vexing extension of ascii, you start to view ascii (and almost any other encoding you'll ever meet) as subset(s) of unicode.

to be true, unicode v6 is certainly not the last word in encodings, but it is as close to being universal as you can get in 2011. get used to it.

查看更多
无色无味的生活
6楼-- · 2019-01-01 07:56

Just in case you are getting something like this u['hello'] then you must be printing an array. print str(arr[0]) and you are good to go.

查看更多
冷夜・残月
7楼-- · 2019-01-01 08:01
from __future__ import unicode_literals

is available since Python 2.6 (released on October 1, 2008). It is default in Python 3.

It allows to omit u'' prefix in the source code though it does not change repr(unicode_string) that would be misleading.

You could override sys.displayhook() in a Python REPL, to display objects however your like. You could also override __repr__ for your own custom objects.

查看更多
登录 后发表回答