Does python support unicode beyond basic multiling

2020-06-20 16:24发布

Below is a simple test. repr seems to work fine. yet len and x for x in doesn't seem to divide the unicode text correctly in Python 2.6 and 2.7:

In [1]: u"                

1条回答
一纸荒年 Trace。
2楼-- · 2020-06-20 16:46

Yes, provided you compiled your Python with wide-unicode support.

By default, Python is built with narrow unicode support only. Enable wide support with:

./configure --enable-unicode=ucs4

You can verify what configuration was used by testing sys.maxunicode:

import sys
if sys.maxunicode == 0x10FFFF:
    print 'Python built with UCS4 (wide unicode) support'
else:
    print 'Python built with UCS2 (narrow unicode) support'

A wide build will use UCS4 characters for all unicode values, doubling memory usage for these. Python 3.3 switched to variable width values; only enough bytes are used to represent all characters in the current value.

Quick demo showing that a wide build handles your sample Unicode string correctly:

$ python2.6
Python 2.6.6 (r266:84292, Dec 27 2010, 00:02:40) 
[GCC 4.4.5] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import sys
>>> sys.maxunicode
1114111
>>> [x for x in u'\U0002f920\U0002f921']
[u'\U0002f920', u'\U0002f921']
查看更多
登录 后发表回答