NOTE: For more answers related to this, please see Special Characters in Google Calculator
I noticed when grabbing the return value for a Google Calculator calculation, the thousands place is separated by a rather odd character. It is not simply a space.
Let's take the example of converting $4,000 USD to GBP.
If you visit the following Google link:
http://www.google.com/ig/calculator?hl=en&q=4000%20usd%20to%20gbp
You'll note that the response is:
{lhs: "4000 U.S. dollars",rhs: "2 497.81441 British pounds",error: "",icc: true}
This looks reasonable, and the thousands place appears to be separated by a whitespace character.
However, if you enter the following into your command line:
curl -s "http://www.google.com/ig/calculator?hl=en&q=4000%20usd%20to%20gbp"
You'll note that the response is:
{lhs: "4000 U.S. dollars",rhs: "2?498.28243 British pounds",error: "",icc: true}
That question mark (?) is a replacement character. What is going on?
AppleScript returns a different replacement character:
{lhs: "4000 U.S. dollars",rhs: "2†498.28243 British pounds",error: "",icc: true}
I am also getting from other sources:
{lhs: "4000 U.S. dollars",rhs: "2�498.28243 British pounds",error: "",icc: true}
It turns out that � is the proper Unicode replacement character 65533.
Can anyone give me insight into what Google is passing me?
According to my tests with
curl
in the Terminal on OSX, by changing the International character encoding in the Terminal preferences : The encoding is iso latin 1.When I set the encoding to UTF8 : I get "2?498.28243"
When I set the encoding to MacRoman : I get "2†498.28243"
First solution : use a user agent from any browser (Safari on OSX 10.6.8 in this example)
Second solution : use
iconv
It's a non-breaking space, U+00A0. It's to ensure that the number won't get broken at the end of a line.
Google returns the correct encoding (UTF-8) however:
so ...
Needless to say, some parts in whatever you use in processing that response seem to be horrible broken in regard to Unicode. Something I'd hope wouldn't really happen that often in this millennium, but apparently it still does.
I figured out what it was by fiddling around in PowerShell a bit:
Also a quick look at the response headers revealed that the encoding is set correctly.
Try