I'm working on a python plugin for Google Quick Search Box, and it's doing some odd things with non-ascii characters. It seems like the code works fine up until I try constructing a string containing the non-ascii characters (ü has been my test character). I am using the following code snippet for the construction, with new_task as the variable that is being input from GQSB.
the_sig = ("%sapi_key%sauth_token%smethod%sname%sparse%stimeline%s" %
(api_secret, api_key, the_token, method, new_task, doParse, timeline))
It's giving me this error:
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 0: ordinal not in range(128)
I am understanding correctly, this is because I am trying to string together a unicode character inside an ascii string. Everything I could find told me to declare the encoding at the top with this:
# -*- coding: iso-8859-15 -*-
Which I have. And when I pull the code snippet that constructs the string into a new script, it works just fine. But for some reason, int he context of the rest of the code, it fails, every time. The only thing I can think of is that it is because it's inside it's own class, but that doesn't make any sense to me.
The full code can be found on GitHub here
Thanks in advance for any help. I am stumped on this one.
This is a bit beyond my expertise, but I think
# -*- coding: iso-8859-15 -*-
at the top declares the text encoding that your Python source file is saved in.Is it really saved in iso-8859-15?
There are a few things you should do to fix this.
Convert all string literal that contain non-ASCII characters to Unicode literals. Example:
u'über'
.Do intermediate processing on Unicode. In other words, if you receive an encoded string (no matter the encoding), decode it to Unicode before working on it. Example:
When outputting the string or sending it somewhere, encode it with an encoding that your receiver understands. Example:
send(s.encode('utf8'))
.Complete example:
I guess you're using Python 2.x.
The file encoding declaration specifies how string literals are read by the interpreter.
You should handle all strings as
unicode
values, notstr
ones. If you read astr
from the outside world, you should decode it tounicode
explicitely. The same applies to outputting strings.