Encode in webclient unexpected result

2019-05-06 18:20发布

问题:

I try use webclient to translate word 'Banana' into rus

private void button1_Click(object sender, EventArgs e)
    {
        Navigate("http://translate.google.ru/translate_a/t?client=x&text=Banana&hl=en&sl=en&tl=ru");
    }

    private void Navigate(String address)
    {
        WebClient client = new WebClient();            
        client.Proxy = WebRequest.DefaultWebProxy;
        client.Credentials = new NetworkCredential("user", "password", "domain");
        client.Proxy.Credentials = new NetworkCredential("user", "password", "domain");
        string _stranslate = client.DownloadString(new Uri(address));
    }

And I want to see in "_stranslate "

{"sentences":[{"trans":"Банан","orig":"Banana@","translit":"Banan @","src_translit":""}],"src":"en","server_time":0}

but got this

{"sentences":[{"trans":"вБОБО","orig":"Banana@","translit":"Banan @","src_translit":""}],"src":"en","server_time":0}

Can some one help me?

回答1:

Try checking the response headers, the content types tells you what encoding you should use.

Content-Type => text/javascript; charset=KOI8-R

So just add this line.

client.Encoding = Encoding.GetEncoding(20866);

or

client.Encoding = Encoding.GetEncoding("KOI8-R");

A complete list for encodings can be found in the documentation for the Encoding Class

Another way would be to just use System.Net.Mime.ContentType to get the charset. Like this:

byte[] data = client.DownloadData(url);
ContentType contentType = new System.Net.Mime.ContentType(client.ResponseHeaders[HttpResponseHeader.ContentType]);
string _stranslate = Encoding.GetEncoding(contentType.CharSet).GetString(data);


回答2:

Add this before your client.DownloadString():

client.Encoding = System.Text.Encoding.UTF8;

Your encoding is likely getting messed up when you read the string.

Using this HTTP header viewer and putting in your URL, I see the following in the headers:

Content-Type: text/javascript; charset=UTF-8
Content-Language: ru

Basically, you need to find out what encoding they are sending back and set your encoding to match.

It is very important to set the encoding before you call DownloadString().



回答3:

IMHO better solution: add URI query parameter oe=UTF-8 and use UTF-8 everywhere

var nameValueCollection = new NameValueCollection
{
    {"client", "x"},
    {"text", HttpUtility.UrlEncode(text)},
    {"hl", "en"},
    {"sl", fromLanguage},
    {"tl", toLanguage},
    {"ie", "UTF-8"},
    {"oe", "UTF-8"}
};

string htmlResult;
using (var wc = new WebClient())
{
    wc.Encoding = Encoding.UTF8;
    wc.QueryString = nameValueCollection;
    htmlResult = wc.DownloadString(GoogleAddress);
}