When I tried to get the content of a tag using "unicode(head.contents[3])" i get the output similar to this: "Christensen Sk\xf6ld". I want the escape sequence to be returned as string. How to do it in python?
相关问题
- how to define constructor for Python's new Nam
- streaming md5sum of contents of a large remote tar
- How to get the background from multiple images by
- Evil ctypes hack in python
- Correctly parse PDF paragraphs with Python
I suspect that it's acutally working correctly. By default, Python displays strings in ASCII encoding, since not all terminals support unicode. If you actually print the string, though, it should work. See the following example:
Assuming Python sees the name as a normal string, you'll first have to decode it to unicode:
Another way of achieving this:
Note the "u" in front of the string, signalling it is uncode. If you print this, the accented letter is shown properly:
BTW: when necessary, you can use de "encode" method to turn the unicode into e.g. a UTF-8 string:
Given a byte string with Unicode escapes
b"\N{SNOWMAN}"
,b"\N{SNOWMAN}".decode('unicode-escape)
will produce the expected Unicode stringu'\u2603'
.