Decoding non standard characters to UTF 8 in Pytho

2019-07-25 09:27发布

I have a program that takes in byte-encoded text via a webhook in Django (written in Python). I have decoding from byte -> utf-8 working for normal letters, but it breaks when an apostrophe ( ' ) is sent in. I have this written to decode the text:

encoded = request.body
decoded = parse_qs(encoded)
body = decoded[b'body'][0].decode("utf-8")

And this is the error:

UnicodeEncodeError: 'ascii' codec can't encode character '\u2019' in position 5: ordinal not in range(128)

I'd like for it to successfully decode apostrophes. I'm also concerned it might break if an emoji is sent in, so I'd like to be able to escape emoji and random chars like ∫, but still preserve the real words in the message.

标签： python django encoding utf-8 decoding

1条回答

爷的心禁止访问

2楼-- · 2019-07-25 09:52

parse_qs will work with decoded utf strings but chokes on non-ascii bytes. For example:

This fails:

a = b'restaurant_type=caf\xc3\xa9'
urllib.parse.parse_qs(a)
# > UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3...etc

but this works okay:

a = b'restaurant_type=caf\xc3\xa9'
urllib.parse.parse_qs(a.decode())
# > {'restaurant_type': ['café']}

Is that what you are asking?

0人赞添加讨论(0) 举报

Decoding non standard characters to UTF 8 in Pytho

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间