PyQt4: Non-ASCII character detected in trUtf8 stri

2019-08-07 17:26发布

I'm trying to figure out how to feed the PyQt tr() or trUtf8() functions UTF-8 text.

Here are examples strings:

self.tr('λληνικά')
self.tr(u'εληνικά')
self.tr('ελνικά'.encode('utf-8'))
self.tr(u'ελληικά'.encode('utf-8'))
self.trUtf8('λληνικ')
self.trUtf8(u'εληνιά')
self.trUtf8('ελνι'.encode('utf-8'))
self.trUtf8(u'ελλκά'.encode('utf-8'))

The ones with self.tr display as gibberish in Qt Linguist. The ones with self.trUtf8 display fine, but they trigger a warning when applying pylupdate4:

Non-ASCII character detected in trUtf8 string

The leading u and the .encode('utf-8') don't seem to make any difference, at least at parsing time (using pylupdate4).

What is the correct way to proceed ?

I'm also wondering about the role of this line:

QtCore.QTextCodec.setCodecForTr(QtCore.QTextCodec.codecForName("utf-8"))

But I know it has no effect on file parsing by pylupdate4, it could only make a difference at execution time.

1条回答
对你真心纯属浪费
2楼-- · 2019-08-07 17:40

The difference between tr and trUtf8, is that the latter explicitly declares that the encoding is UTF-8.

On its own, tr implies nothing about the encoding of the string, and so you either must only ever pass it ascii strings, or explicitly set an appropriate encoding using setCodecForTr. But as you surmised, that will only have an effect at runtime. In order for pylupdate to also use that encoding, you need to set a corresponding variable in the pro file:

CODECFORTR = UTF-8
SOURCES = source.py
TRANSLATIONS = translation.ts

(It seems that pylupdate will assume a latin-1 encoding without that, so any characters not available in that encoding will end up as mojibake).

As for the warnings messages: they are probably there to reflect the corresponding warnings in the Qt docs for trUtf8 regarding portability issues.

The best way to proceeed is to use tr and explicitly set the encoding to UTF-8. The trUtf8 function is effectively obsolete in Qt4. It doesn't even exist in Qt5, which assumes UTF-8 for everything - so eventually you won't even need to explicitly set the encoding.

查看更多
登录 后发表回答