I am now working on a small tool to request and decode a webpage, on which the Chinese characters are stored as string like
\u5c0f\u738b\u5b50\u003a\u6c49\u6cd5\u82f1\u5bf9\u7167
in the source code, something of unicode. I want to convert it to Chinese characters.
I can make it through this website http://rishida.net/tools/conversion/. But How can I make it using python?
Those are Unicode codepoints already. They represent Chinese characters, but using escape codes that are easier on the developer:
You do not have to do anything to convert those; the
\uxxxx
escape form is simply another way to express the same codepoint. See String Literals:Python interprets those escape codes when reading the source code to construct the unicode value.
If the source of the data is not from Python source code but from the web, you have JSON data instead, which uses the same escape format:
Note that the value then needs to be part of a larger string, one that at least includes quotes to mark this a string.
Also note that the JSON string escape format differs from Python's when it comes to non-BMP (supplementary) codepoints; JSON treats those like UTF-16 does, by creating a surrogate pair and use two
\uxxxx
sequences for such a codepoint. In Python you'd use a\Uhhhhhhhh
32-bit hex value.