Converting text containing COMBINING DIAERESIS to

2020-03-24 05:37发布

问题:

We hav some text containing german umlauts represented using e.g. 'a' + COMBINING DIAERESIS ($cc $88).

Any idea how to convert such text properly to utf8?

回答1:

First, if it's not already a unicode then decode it. Second, unicodedata.normalize(). Third, encode.