this issue seems specific to microsofttranslator.com so please ... any answers, if you can test against it ...
Using the following URL for translation: http://api.microsofttranslator.com/V2/Ajax.svc/TranslateArray .. I send via cURL some fantastic arguments, and get back the following result:
[
{
"From":"en",
"OriginalTextSentenceLengths":[13],
"TranslatedText":"我是最好的",
"TranslatedTextSentenceLengths":[5]
},
{
"From":"en",
"OriginalTextSentenceLengths":[16],
"TranslatedText":"你是最好的",
"TranslatedTextSentenceLengths":[5]
}
]
When I use json_decode($output, true);
on the output from cURL, json_decode gives an error about the syntax not being appropriate in the returned JSON:
json_last_error() == JSON_ERROR_SYNTAX
The headers being returned with the JSON:
Response Headers
Cache-Control:no-cache
Content-Length:244
Content-Type:application/x-javascript; charset=utf-8
Date:Sat, 06 Aug 2011 13:35:08 GMT
Expires:-1
Pragma:no-cache
X-MS-Trans-Info:s=63644
Raw content:
[{"From":"en","OriginalTextSentenceLengths":[13],"TranslatedText":"我是最好的","TranslatedTextSentenceLengths":[5]},{"From":"en","OriginalTextSentenceLengths":[16],"TranslatedText":"你是最好的","TranslatedTextSentenceLengths":[5]}]
cURL code:
$texts = array("i am the best" => 0, "you are the best" => 0);
$ch = curl_init();
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
$data = array(
'appId' => $bing_appId,
'from' => 'en',
'to' => 'zh-CHS',
'texts' => json_encode(array_keys($texts))
);
curl_setopt($ch, CURLOPT_URL, $bingArrayUrl . '?' . http_build_query($data));
$output = curl_exec($ch);
The API is returning a wrong byte order mark (BOM).
The string data itself is UTF-8 but is prepended with U+FEFF
which is a UTF-16 BOM. Just strip out the first two bytes and json_decode
.
...
$output = curl_exec($ch);
// Insert some sanity checks here... then,
$output = substr($output, 3);
...
$decoded = json_decode($output, true);
Here's the entirety of my test code.
$texts = array("i am the best" => 0, "you are the best" => 0);
$ch = curl_init();
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
$data = array(
'appId' => $bing_appId,
'from' => 'en',
'to' => 'zh-CHS',
'texts' => json_encode(array_keys($texts))
);
curl_setopt($ch, CURLOPT_URL, $bingArrayUrl . '?' . http_build_query($data));
$output = curl_exec($ch);
$output = substr($output, 3);
print_r(json_decode($output, true));
Which gives me
Array
(
[0] => Array
(
[From] => en
[OriginalTextSentenceLengths] => Array
(
[0] => 13
)
[TranslatedText] => 我是最好的
[TranslatedTextSentenceLengths] => Array
(
[0] => 5
)
)
[1] => Array
(
[From] => en
[OriginalTextSentenceLengths] => Array
(
[0] => 16
)
[TranslatedText] => 你是最好的
[TranslatedTextSentenceLengths] => Array
(
[0] => 5
)
)
)
Wikipedia entry on BOM
There is nothing syntactically wrong with your JSON string. It is possible that the json is coming back with characters outside the UTF-8 byte range, but this would cause json_decode() to throw an exception indicating that.
Test Code:
ini_set("track_errors", 1);
$json = '
[
{
"From":"en",
"OriginalTextSentenceLengths":[13],
"TranslatedText":"我是最好的",
"TranslatedTextSentenceLengths":[5]
},
{
"From":"en",
"OriginalTextSentenceLengths":[16],
"TranslatedText":"你是最好的",
"TranslatedTextSentenceLengths":[5]
}
]
';
$out = @json_decode($json, TRUE);
if(!$out) {
throw new Exception("$php_errormsg\n");
} else {
print_r($out);
}
?>
Output:
$ php -f jsontest.php
Array
(
[0] => Array
(
[From] => en
[OriginalTextSentenceLengths] => Array
(
[0] => 13
)
[TranslatedText] => 我是最好的
[TranslatedTextSentenceLengths] => Array
(
[0] => 5
)
)
[1] => Array
(
[From] => en
[OriginalTextSentenceLengths] => Array
(
[0] => 16
)
[TranslatedText] => 你是最好的
[TranslatedTextSentenceLengths] => Array
(
[0] => 5
)
)
)