Remove BOM () from imported .csv file

2019-01-27 02:50发布

问题:

I want to delete the BOM from my imported file, but it just doesn't seem to work.

I tried to preg_replace('/[\x00-\x1F\x80-\xFF]/', '', $file); and a str_replace.

I hope anybody sees what I'm doing wrong.

$filepath = get_bloginfo('template_directory')."/testing.csv";
            setlocale(LC_ALL, 'nl_NL');
            ini_set('auto_detect_line_endings',TRUE);
            $file = fopen($filepath, "r") or die("Error opening file");
            $i = 0;
            while(($line = fgetcsv($file, 1000, ";")) !== FALSE) {
                if($i == 0) {
                    $c = 0;
                    foreach($line as $col) {
                        $cols[$c] = utf8_encode($col);
                        $c++;
                    }
                } else if($i > 0) {
                    $c = 0;
                    foreach($line as $col) {
                        $data[$i][$cols[$c]] = utf8_encode($col);
                        $c++;
                    }
                }
                $i++;
            }

-----------
SOLVED VERSION:

setlocale(LC_ALL, 'nl_NL');
ini_set('auto_detect_line_endings',TRUE);
require_once(ABSPATH.'wp-admin/includes/file.php' );

$path = get_home_path();        
$filepath = $path .'wp-content/themes/pon/testing.csv';
$content = file_get_contents($filepath); 
file_put_contents($filepath, str_replace("\xEF\xBB\xBF",'', $content));

// FILE_PUT_CONTENTS AUTOMATICCALY CLOSES THE FILE
$file = fopen($filepath, "r") or die("Error opening file"); 

$i = 0;
while(($line = fgetcsv($file, 1000, ";")) !== FALSE) {
    if($i == 0) {
        $c = 0;
        foreach($line as $col) {
            $cols[$c] = $col;
            $c++;
        }
    } else if($i > 0) {
        $c = 0;
        foreach($line as $col) {
            $data[$i][$cols[$c]] = $col;
            $c++;
        }
    }
    $i++;
}

I found that it removes the BOM and adjusts the file by overwriting it with the new data. The problem is that the rest of my script doesn't work anymore and I can't see why. It is a new .csv file

回答1:

Try this:

function removeBomUtf8($s){
  if(substr($s,0,3)==chr(hexdec('EF')).chr(hexdec('BB')).chr(hexdec('BF'))){
       return substr($s,3);
   }else{
       return $s;
   }
}


回答2:

Read data with file_get_contents then use mb_convert_encoding to convert to UTF-8

UPDATE

$filepath = get_bloginfo('template_directory')."/testing.csv";
$fileContent = file_get_contents($filepath);
$fileContent = mb_convert_encoding($fileContent, "UTF-8");
$lines = explode("\n", $fileContent);
foreach($lines as $line) {
    $conls = explode(";", $line);
    // etc...
}


回答3:

If the character encoding functions don't work for you (as is the case for me in some situations) and you know for a fact that your file always has a BOM, you can simply use an fseek() to skip the first 3 bytes, which is the length of the BOM.

$fp = fopen("testing.csv", "r");
fseek($fp, 3);

You should also not use explode() to split your CSV lines and columns because if your column contains the character by which you split, you will get an incorrect result. Use this instead:

while (!feof($fp)) {
    $arrayLine = fgetcsv($fp, 0, ";", '"');
    ...
}