I have a string.
s = u"<script language=javascript>alert('\xc7\xeb\xca\xe4\xc8\xeb\xd5\xfd\xc8\xb7\xd1\xe9\xd6\xa4\xc2\xeb,\xd0\xbb\xd0\xbb!');location='index.asp';</script></script>"
How can I translate s
into a utf-8 string? I have tried s.decode('gbk').encode('utf-8')
but python reports error: UnicodeEncodeError: 'ascii' codec can't encode characters in position 35-50: ordinal not in range(128)
If you can keep the alert in a separate string "a":
Then it will print:
If you want to automatically extract the substring in one go:
will print:
You are mixing apples and oranges. The GBK-encoded string is not a Unicode string and should hence not end up in a
u'...'
string.This is the correct way to do it in Python 2.
Notice how the initializer for
g
which is passed to.decode('gbk')
is not represented as a Unicode string, but as a plain byte string.See also http://nedbatchelder.com/text/unipain.html
in python2, try this to convert your unicode string:
then you can encode to utf-8 as you wish.
I got the same question
Like this:
I want convert to
Here is my solution:
And I tried yours
Then:
Hope can help you ..
Take a look at
unicodedata
but I think one way to do this is: