Hi I have a website that is in Traditional Chinese and when I check the site statistics it tell me that the search term for the website is å%8f°å%8d%97 親å%90é¤%90廳
which obviously makes no sense to me. My question is what is this encoding called? And is there a way to use Python to decode this character string. Thank you.
相关问题
- how to define constructor for Python's new Nam
- streaming md5sum of contents of a large remote tar
- How to get the background from multiple images by
- Evil ctypes hack in python
- Correctly parse PDF paragraphs with Python
It is called a mutt encoding; the underlying bytes have been mangled beyond their original meaning and they are no longer a real encoding.
It was once URL-quoted UTF-8, but now interpreted as latin-1 without unquoting those URL escapes. I was able to un-mangle this by interpreting it as such:
You can use chardet. Install the library with:
The library includes a cli utility
chardetect
(orchardetect3
accordingly) that takes the path to a file.Once you know the encoding you can use in python something like:
or from shell: