My string is Niệm Bồ Tát (Thiá»n sÆ° Nhất Hạnh)
and I want to decode it to Niệm Bồ Tát (Thiền sư Nhất Hạnh)
. I see in that site can do that http://www.enderminh.com/minh/utf8-to-unicode-converter.aspx
and I start to try by Python
mystr = '09. Bát Nhã Tâm Kinh'
mystr.decode('utf-8')
but actually it is not correct because original string is utf-8 but the string show is not my expecting result.
Note: it is Vietnamese character.
How to resolve that case? Is that Windows Unicode or something? How to detect the encoding here.
I'm not sure what you can do with these kind of data, but for your example in your original post, this works:
The only thing that helped me with broken cyrillic string - https://github.com/LuminosoInsight/python-ftfy
This module fixes pretty much everything and works much better than online decoders.
It can be easily installed using
pip install ftfy