CURL import character encoding problem

2019-01-12 04:14发布

I'm using CURL to import some code. However, in french, all the characters come out funny. For example: Bonjour ...

I don't have access to change anything on the imported code. Is there anything I can do my side to fix this?

Thanks

5条回答
相关推荐>>
2楼-- · 2019-01-12 04:56

I had a similar problem. I tried to loop through all combinations of input and output charsets. Nothing helped! :(

However I was able to access the code that actually fetched the data and this is where the culprit lied. Data was fetched via cURL. Adding

 curl_setopt($ch,CURLOPT_BINARYTRANSFER,true);

fixed it.

A handy set of code to try out all possible combinations of a list of charsets:

$charsets = array(  
        "UTF-8", 
        "ASCII", 
        "Windows-1252", 
        "ISO-8859-15", 
        "ISO-8859-1", 
        "ISO-8859-6", 
        "CP1256"
        ); 

foreach ($charsets as $ch1) { 
    foreach ($charsets as $ch2){ 
        echo "<h1>Combination $ch1 to $ch2 produces: </h1>".iconv($ch1, $ch2, $text_2_convert); 
    } 
} 
查看更多
【Aperson】
3楼-- · 2019-01-12 05:01

You could replace your

$data = curl_exec($ch);

by

$data = utf8_decode(curl_exec($ch));

I had this same issue and it worked well for me.

查看更多
萌系小妹纸
4楼-- · 2019-01-12 05:06

PHP seems to use UTF-8 by default, so I found the following works

$text = iconv("UTF-8","Windows-1252",$text);

查看更多
叼着烟拽天下
5楼-- · 2019-01-12 05:07

Like Jon Skeet pointed it's difficult to understand your situation, however if you have access only to final text, you can try to use iconv for changing text encoding.

I.e.

$text = iconv("Windows-1252","UTF-8",$text);

I've had similar issue time ago (with Italian language and special chars) and I've solved it in this way.

Try different combination (UTF-8, ISO-8859-1, Windows-1252).

查看更多
Animai°情兽
6楼-- · 2019-01-12 05:17

I'm currently suffering a similar problem, i'm trying to write a simple html <title> importer cia cURL. So i'm going to give an idea of what i've done until now:

  1. Retrieve the HTML via cURL
  2. Check if there's any hint of encoding on the response headers via curl_getinfo() and match it via regex
  3. Parse the HTML for the purpose of looking at the content-type meta and the <title> tag (yes, i know the consequences)
  4. Compare both content-type, header and meta and choose the meta one if it's different, because we know noone cares about their httpd configuration and there are a lot of dirt workarounds using it
  5. iconv() the string
  6. Whish everyday that when someone does not follow the standards $DEITY punishes him/her until the end of the days, because it would save me the meta parsing
查看更多
登录 后发表回答