How to specify encoding while process csv file in

2020-04-16 03:24发布

<?php
$row = 1;
$handle = fopen ("test.csv","r");
while ($data = fgetcsv ($handle, 1000, ",")) {
    $num = count ($data);
    print "<p> $num fields in line $row: <br>\n";
    $row++;
    for ($c=0; $c < $num; $c++) {
        print $data[$c] . "<br>\n";
    }
}
fclose ($handle);
?> 

The above comes from php manual,but I didn't see where to specify the encoding(like utf8 or so)

标签: php fgetcsv
3条回答
我想做一个坏孩纸
2楼-- · 2020-04-16 03:54

Try to change the locale.

Like it says below the example in the manual you gave:

Note: Locale setting is taken into account by this function. If LANG is e.g. en_US.UTF-8, files in one-byte encoding are read wrong by this function.

Suggested approach by comment on the same page:

setlocale(LC_ALL, 'ja_JP.UTF8'); // for japanese locale

From setlocale():

Locale names can be found in RFC 1766 and ISO 639. Different systems have different naming schemes for locales. […] On Windows, setlocale(LC_ALL, '') sets the locale names from the system's regional/language settings (accessible via Control Panel).

查看更多
够拽才男人
3楼-- · 2020-04-16 03:57

try this:

<?php
$handle = fopen ("specialchars.csv","r");
echo '<table border="1"><tr><td>First name</td><td>Last name</td></tr><tr>';
while ($data = fgetcsv ($handle, 1000, ";")) {
        $data = array_map("utf8_encode", $data); //added
        $num = count ($data);
        for ($c=0; $c < $num; $c++) {
            // output data
            echo "<td>$data[$c]</td>";
        }
        echo "</tr><tr>";
}
?>
查看更多
Melony?
4楼-- · 2020-04-16 03:58

One such thing is the occurrence of the UTF byte order mark, or BOM. The UTF-8 character for the byte order mark is U+FEFF, or rather three bytes – 0xef, 0xbb and 0xbf – that sits in the beginning of the text file. For UTF-16 it is used to indicate the byte order. For UTF-8 it is not really necessary.

So you need to detect the three bytes and remove the BOM. Below is a simplified example on how to detect and remove the three bytes.

$str = file_get_contents('file.utf8.csv');
$bom = pack("CCC", 0xef, 0xbb, 0xbf);
if (0 == strncmp($str, $bom, 3)) {
    echo "BOM detected - file is UTF-8\n";
    $str = substr($str, 3);
}

That's all

查看更多
登录 后发表回答