I'm trying to figure out how to feed the PyQt tr()
or trUtf8()
functions UTF-8 text.
Here are examples strings:
self.tr('λληνικά')
self.tr(u'εληνικά')
self.tr('ελνικά'.encode('utf-8'))
self.tr(u'ελληικά'.encode('utf-8'))
self.trUtf8('λληνικ')
self.trUtf8(u'εληνιά')
self.trUtf8('ελνι'.encode('utf-8'))
self.trUtf8(u'ελλκά'.encode('utf-8'))
The ones with self.tr
display as gibberish in Qt Linguist. The ones with self.trUtf8
display fine, but they trigger a warning when applying pylupdate4
:
Non-ASCII character detected in trUtf8 string
The leading u and the .encode('utf-8')
don't seem to make any difference, at least at parsing time (using pylupdate4
).
What is the correct way to proceed ?
I'm also wondering about the role of this line:
QtCore.QTextCodec.setCodecForTr(QtCore.QTextCodec.codecForName("utf-8"))
But I know it has no effect on file parsing by pylupdate4
, it could only make a difference at execution time.
The difference between tr
and trUtf8
, is that the latter explicitly declares that the encoding is UTF-8
.
On its own, tr
implies nothing about the encoding of the string, and so you either must only ever pass it ascii strings, or explicitly set an appropriate encoding using setCodecForTr
. But as you surmised, that will only have an effect at runtime. In order for pylupdate
to also use that encoding, you need to set a corresponding variable in the pro file:
CODECFORTR = UTF-8
SOURCES = source.py
TRANSLATIONS = translation.ts
(It seems that pylupdate
will assume a latin-1
encoding without that, so any characters not available in that encoding will end up as mojibake).
As for the warnings messages: they are probably there to reflect the corresponding warnings in the Qt docs for trUtf8 regarding portability issues.
The best way to proceeed is to use tr
and explicitly set the encoding to UTF-8
. The trUtf8
function is effectively obsolete in Qt4. It doesn't even exist in Qt5, which assumes UTF-8
for everything - so eventually you won't even need to explicitly set the encoding.