Decoding non standard characters to UTF 8 in Pytho

2019-07-25 09:27发布

I have a program that takes in byte-encoded text via a webhook in Django (written in Python). I have decoding from byte -> utf-8 working for normal letters, but it breaks when an apostrophe ( ' ) is sent in. I have this written to decode the text:

encoded = request.body
decoded = parse_qs(encoded)
body = decoded[b'body'][0].decode("utf-8")

And this is the error:

UnicodeEncodeError: 'ascii' codec can't encode character '\u2019' in position 5: ordinal not in range(128)

I'd like for it to successfully decode apostrophes. I'm also concerned it might break if an emoji is sent in, so I'd like to be able to escape emoji and random chars like ∫, but still preserve the real words in the message.

1条回答
爷的心禁止访问
2楼-- · 2019-07-25 09:52

parse_qs will work with decoded utf strings but chokes on non-ascii bytes. For example:

This fails:

a = b'restaurant_type=caf\xc3\xa9'
urllib.parse.parse_qs(a)
# > UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3...etc

but this works okay:

a = b'restaurant_type=caf\xc3\xa9'
urllib.parse.parse_qs(a.decode())
# > {'restaurant_type': ['café']}

Is that what you are asking?

查看更多
登录 后发表回答