How to get unicode month name in Python?

2020-07-21 05:55发布

问题:

I am trying to get a unicode version of calendar.month_abbr[6]. If I don't specify an encoding for the locale, I don't know how to convert the string to unicode. The example code below shows my problem:

>>> import locale
>>> import calendar
>>> locale.setlocale(locale.LC_ALL, ("ru_RU"))
'ru_RU'
>>> print repr(calendar.month_abbr[6])
'\xb8\xee\xdd'
>>> print repr(calendar.month_abbr[6].decode("utf8"))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python2.5/encodings/utf_8.py", line 16, in decode
    return codecs.utf_8_decode(input, errors, True)
UnicodeDecodeError: 'utf8' codec can't decode byte 0xb8 in position 0: unexpected code byte
>>> locale.setlocale(locale.LC_ALL, ("ru_RU", "utf8"))
'ru_RU.UTF8'
>>> print repr(calendar.month_abbr[6])
'\xd0\x98\xd1\x8e\xd0\xbd'
>>> print repr(calendar.month_abbr[6].decode("utf8"))
u'\u0418\u044e\u043d'

Any ideas how to solve this? The solution doesn't have to look like this. Any solution that gives me the abbreviated month name in unicode is fine.

回答1:

Change the last line in your code:

>>> print calendar.month_abbr[6].decode("utf8")
Июн

Improperly used repr() hides from you that you already get what you needed.

Also getlocale() can be used to get encoding for current locale:

>>> locale.setlocale(locale.LC_ALL, 'en_US')
'en_US'
>>> locale.getlocale()
('en_US', 'ISO8859-1')

Another modules that might be useful for you:

  • PyICU - a better way for internationalization. While locale produce either initial or inflected form of month name depending on locale database in your OS (so you can't rely on it for such languages like Russian!) and uses some encoding, PyICU has different format specifiers for initial and inflected form (so you can select appropriate in your case) and uses unicode.
  • pytils - a set of tools to work with Russian language, including dates. It has hard-coded month names as workaround for locale limitations.


回答2:

What you need is:

…
myencoding= locale.getpreferredencoding()
print repr(calendar.month_abbr[6].decode(myencoding))
…