Python convert file content to unicode form

2019-05-26 06:08发布

问题:

For example, I have a file a.js whose content is:

Hello, 你好, bye.  

Which contains two Chinese characters whose unicode form is \u4f60\u597d
I want to write a python program which convert the Chinese characters in a.js to its unicode form to output b.js, whose content should be: Hello, \u4f60\u597d, bye.

My code:

fp = open("a.js")
content = fp.read()
fp.close()

fp2 = open("b.js", "w")
result = content.decode("utf-8")
fp2.write(result)
fp2.close()  

but it seems that the Chinese characters are still one character , not an ASCII string like I want.

回答1:

>>> print u'Hello, 你好, bye.'.encode('unicode-escape')
Hello, \u4f60\u597d, bye.

But you should consider using JSON, via json.



回答2:

You can try codecs module

codecs.open(filename, mode[, encoding[, errors[, buffering]]])

a = codecs.open("a.js", "r", "cp936").read() # a is a unicode object

codecs.open("b.js", "w", "utf16").write(a)


回答3:

I found that repr(content.decode("utf-8")) will return "u'Hello, \u4f60\u597d, bye'"
so repr(content.decode("utf-8"))[2:-1] will do the job



回答4:

you can use repr:

a = u"Hello, 你好, bye. "
print repr(a)[2:-1]

or you can use encode method:

print a.encode("raw_unicode_escape")
print a.encode("unicode_escape")