Can I turn off implicit Python unicode conversions

2020-05-29 00:19发布

问题:

When profiling our code I was surprised to find millions of calls to
C:\Python26\lib\encodings\utf_8.py:15(decode)

I started debugging and found that across our code base there are many small bugs, usually comparing a string to a unicode or adding a sting and a unicode. Python graciously decodes the strings and performs the following operations in unicode.

How kind. But expensive!

I am fluent in unicode, having read Joel Spolsky and Dive Into Python...

I try to keep our code internals in unicode only.

My question - can I turn off this pythonic nice-guy behavior? At least until I find all these bugs and fix them (usually by adding a u'u')?

Some of them are extremely hard to find (a variable that is sometimes a string...).

Python 2.6.5 (and I can't switch to 3.x).

回答1:

The following should work:

>>> import sys
>>> reload(sys)
<module 'sys' (built-in)>
>>> sys.setdefaultencoding('undefined')
>>> u"abc" + u"xyz"
u'abcxyz'
>>> u"abc" + "xyz"
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/encodings/undefined.py", line 22, in decode
    raise UnicodeError("undefined encoding")
UnicodeError: undefined encoding

reload(sys) in the snippet above is only necessary here since normally sys.setdefaultencoding is supposed to go in a sitecustomize.py file in your Python site-packages directory (it's advisable to do that).