My task is simple: make a post request to translate.google.com and get the translation. In the following example I'm using the word "hello" to translate into russian.
header('Content-Type: text/plain; charset=utf-8'); // optional
error_reporting(E_ALL | E_STRICT);
$context = stream_context_create(array(
'http' => array(
'method' => 'POST',
'header' => implode("\r\n", array(
'Content-type: application/x-www-form-urlencoded',
'Accept-Language: en-us,en;q=0.5', // optional
'Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7' // optional
)),
'content' => http_build_query(array(
'prev' => '_t',
'hl' => 'en',
'ie' => 'UTF-8',
'text' => 'hello',
'sl' => 'en',
'tl' => 'ru'
))
)
));
$page = file_get_contents('http://translate.google.com/translate_t', false, $context);
require '../simplehtmldom/simple_html_dom.php';
$dom = str_get_html($page);
$translation = $dom->find('#result_box', 0)->plaintext;
echo $translation;
Lines marked as optional are those without which the output is the same. But I'm getting weird characters...
������
I tried
echo mb_convert_encoding($translation, 'UTF-8');
But I get
ÐÒÉ×ÅÔ
Does anybody know how to solve this problem?
UPDATE:
- Forgot to mention that all my php files are encoded in UTF-8 without BOM
- When i change the "to" language to "en", that is translate from english to english, it works ok.
- I do not think the library I'm using is messing it up, because I tried to output the whole $page without passing it to the library functions.
- I'm using PHP 5
First off, is your browser set to UTF-8? In Firefox you can set your text encoding in View->Character Encoding. Make sure you have "Unicode (UTF-8)" selected. I would also set View->Character Encoding->Auto-Detect to "Universal."
Secondly, you could try passing the FILE_TEXT flag, like so:
Accept-Charset is not really that optional. You should specify UTF8 there. Russian characters are not valid in ISO_8859-1
Try to see this post if it can help CURL import character encoding problem
Also you can try this snippet (taken from php.net)