UTF-8 error with Python and gettext

I use UTF-8 in my editor, so all strings displayed here are UTF-8 in file.

I have a python script like this:

# -*- coding: utf-8 -*-
...
parser = optparse.OptionParser(
  description=_('automates the dice rolling in the classic game "risk"'), 
  usage=_("usage: %prog attacking defending"))

Then I used xgettext to get everything out and got a .pot file which can be boiled down to:

"Content-Type: text/plain; charset=CHARSET\n"
"Content-Transfer-Encoding: 8bit\n"

#: auto_dice.py:16
msgid "automates the dice rolling in the classic game \"risk\""
msgstr ""

After that, I used msginit to get a de.po which I filled in like this:

"Content-Type: text/plain; charset=UTF-8\n"
"Content-Transfer-Encoding: 8bit\n"

#: auto_dice.py:16
msgid "automates the dice rolling in the classic game \"risk\""
msgstr "automatisiert das Würfeln bei \"Risiko\""

Running the script, I get the following error:

  File "/usr/lib/python2.6/optparse.py", line 1664, in print_help
    file.write(self.format_help().encode(encoding, "replace"))
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 60: ordinal not in range(128)

How can I fix that?

标签： python localization gettext

3条回答

仙女界的扛把子

2楼-- · 2019-03-29 04:29

I'm not familiar with this, but it appears to be a known bug in 2.6 that's been fixed in 2.7:

http://bugs.python.org/issue2931

If it's not feasible for you to use 2.7, try this workaround:

http://mail.python.org/pipermail/python-dev/2006-May/065458.html

0人赞添加讨论(0) 举报

姐就是有狂的资本

3楼-- · 2019-03-29 04:32

My suspicion is that the problem is caused by _("string") returning a byte string and not a Unicode string.

The obvious workaround is this:

parser = optparse.OptionParser(
        description=_('automates the dice rolling in the classic game "risk"').decode('utf-8'),
        usage=_("usage: %prog attacking defending").decode('utf-8'))

But that feels wrong.

ugettext or install(True) may help.

The Python gettext docs give these examples:

import gettext
t = gettext.translation('spam', '/usr/share/locale')
_ = t.ugettext

or:

import gettext
gettext.install('myapplication', '/usr/share/locale', unicode=1)

I am trying to reproduce your problem, and even if I use install(unicode=1), I get back a byte string (str type).

Either I am using gettext incorrectly, or I am missing a character coding declaration in my .po/.mo file.

I will update when I know more.

xlt = _('automates the dice rolling in the classic game "risk"')
print type(xlt)
if isinstance(xlt, str):
    print 'gettext returned a str (wrong)'
    print xlt
    print xlt.decode('utf-8').encode('utf-8')
elif isinstance(xlt, unicode):
    print 'gettext returned a unicode (right)'
    print xlt.encode('utf-8')

(One other possibility is to use escapes or Unicode code points in the .po file, but that doesn't sound like fun.)

(Or you could look at your system's .po files to see how they handle non-ASCII characters.)

0人赞添加讨论(0) 举报

倾城　Initia

4楼-- · 2019-03-29 04:38

That error means you've called encode on a bytestring, so it tries to decode it to Unicode using the system default encoding (ascii on Python 2), then re-encode it with whatever you've specified.

Generally, the way to resolve it is to call s.decode('utf-8') (or whatever encoding the strings are in) before trying to use the strings. It might also work if you just use unicode literals: u'automates...' (that depends on how strings are substituted from .po files, which I don't know about).

This sort of confusing behaviour is improved in Python 3, which won't try to convert bytes to unicode unless you specifically tell it to.

0人赞添加讨论(0) 举报

UTF-8 error with Python and gettext

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间