How do I properly work with unicode characters in

I'm working on a python plugin for Google Quick Search Box, and it's doing some odd things with non-ascii characters. It seems like the code works fine up until I try constructing a string containing the non-ascii characters (ü has been my test character). I am using the following code snippet for the construction, with new_task as the variable that is being input from GQSB.

the_sig = ("%sapi_key%sauth_token%smethod%sname%sparse%stimeline%s" %
           (api_secret, api_key, the_token, method, new_task, doParse, timeline))

It's giving me this error:

UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 0: ordinal not in range(128)

I am understanding correctly, this is because I am trying to string together a unicode character inside an ascii string. Everything I could find told me to declare the encoding at the top with this:

# -*- coding: iso-8859-15 -*-

Which I have. And when I pull the code snippet that constructs the string into a new script, it works just fine. But for some reason, int he context of the rest of the code, it fails, every time. The only thing I can think of is that it is because it's inside it's own class, but that doesn't make any sense to me.

The full code can be found on GitHub here

Thanks in advance for any help. I am stumped on this one.

标签： python unicode encoding ascii

3条回答

唯我独甜

2楼-- · 2019-05-24 03:17

This is a bit beyond my expertise, but I think # -*- coding: iso-8859-15 -*- at the top declares the text encoding that your Python source file is saved in.

Is it really saved in iso-8859-15?

0人赞添加讨论(0) 举报

Melony?

3楼-- · 2019-05-24 03:19

There are a few things you should do to fix this.

Convert all string literal that contain non-ASCII characters to Unicode literals. Example: u'über'.
Do intermediate processing on Unicode. In other words, if you receive an encoded string (no matter the encoding), decode it to Unicode before working on it. Example:
```
s = utf8_string.decode('utf8') + latin1_string.decode('latin1')
```
When outputting the string or sending it somewhere, encode it with an encoding that your receiver understands. Example: send(s.encode('utf8')).

Complete example:

input1 = get_possibly_nonascii_input().decode('iso-8859-1')
input2 = get_possibly_nonascii_input().decode('iso-8859-1')
input3 = u'üvw'

s =  u'%s -> %s' % (input3, (input1 + input2).upper())

send_output(s.encode('utf8'))

0人赞添加讨论(0) 举报

家丑人穷心不美

4楼-- · 2019-05-24 03:32

I guess you're using Python 2.x.

The file encoding declaration specifies how string literals are read by the interpreter.

You should handle all strings as unicode values, not str ones. If you read a str from the outside world, you should decode it to unicode explicitely. The same applies to outputting strings.

# -*- coding: utf-8 -*-
u_dia_str = '\xc3\xbc'   # str
lambda_unicode = u'λ'    # unicode

# input value
u_dia = u_dia_str.decode('utf-8')

sig_unicode = u'%s%s' % (u_dia, lambda_unicode)
# => u'üλ'

# output value
sig_str = sig_unicode.encode('utf-8')
# => '\xc3\xbc\xce\xbb'

0人赞添加讨论(0) 举报

How do I properly work with unicode characters in

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间