We hav some text containing german umlauts represented using e.g. 'a' + COMBINING DIAERESIS ($cc $88).
Any idea how to convert such text properly to utf8?
We hav some text containing german umlauts represented using e.g. 'a' + COMBINING DIAERESIS ($cc $88).
Any idea how to convert such text properly to utf8?
First, if it's not already a unicode
then decode it. Second, unicodedata.normalize()
. Third, encode.