How to remove multiple UTF-8 BOM sequences before

2018-12-31 22:42发布

Using PHP5 (cgi) to output template files from the filesystem and having issues spitting out raw HTML.

private function fetch($name) {
    $path = $this->j->config['template_path'] . $name . '.html';
    if (!file_exists($path)) {
        dbgerror('Could not find the template "' . $name . '" in ' . $path);
    }
    $f = fopen($path, 'r');
    $t = fread($f, filesize($path));
    fclose($f);
    if (substr($t, 0, 3) == b'\xef\xbb\xbf') {
        $t = substr($t, 3);
    }
    return $t;
}

Even though I've added the BOM fix I'm still having problems with Firefox accepting it. You can see a live copy here: http://ircb.in/jisti/ (and the template file I threw at http://ircb.in/jisti/home.html if you want to check it out)

Any idea how to fix this? o_o

9条回答
孤独寂梦人
2楼-- · 2018-12-31 23:07

This global funtion resolve for UTF-8 system base charset. Tanks!

function prepareCharset($str) {

    // set default encode
    mb_internal_encoding('UTF-8');

    // pre filter
    if (empty($str)) {
        return $str;
    }

    // get charset
    $charset = mb_detect_encoding($str, array('ISO-8859-1', 'UTF-8', 'ASCII'));

    if (stristr($charset, 'utf') || stristr($charset, 'iso')) {
        $str = iconv('ISO-8859-1', 'UTF-8//TRANSLIT', utf8_decode($str));
    } else {
        $str = mb_convert_encoding($str, 'UTF-8', 'UTF-8');
    }

    // remove BOM
    $str = urldecode(str_replace("%C2%81", '', urlencode($str)));

    // prepare string
    return $str;
}
查看更多
有味是清欢
3楼-- · 2018-12-31 23:09

if anybody using csv import then below code useful

           $header = fgetcsv($handle);
            foreach($header as $key=> $val) {
                $bom = pack('H*','EFBBBF');
                $val = preg_replace("/^$bom/", '', $val);
                $header[$key] = $val;
            }
查看更多
查无此人
4楼-- · 2018-12-31 23:18

you would use the following code to remove utf8 bom

//Remove UTF8 Bom

function remove_utf8_bom($text)
{
    $bom = pack('H*','EFBBBF');
    $text = preg_replace("/^$bom/", '', $text);
    return $text;
}
查看更多
冷夜・残月
5楼-- · 2018-12-31 23:18

An extra method to do the same job:

function remove_utf8_bom_head($text) {
    if(substr(bin2hex($text), 0, 6) === 'efbbbf') {
        $text = substr($text, 3);
    }
    return $text;
}

The other methods I found cannot work in my case.

Hope it helps in some special case.

查看更多
唯独是你
6楼-- · 2018-12-31 23:20

b'\xef\xbb\xbf' stands for the literal string "\xef\xbb\xbf". If you want to check for a BOM, you need to use double quotes, so the \x sequences are actually interpreted into bytes:

"\xef\xbb\xbf"

Your files also seem to contain a lot more garbage than just a single leading BOM:

$ curl http://ircb.in/jisti/ | xxd

0000000: efbb bfef bbbf efbb bfef bbbf efbb bfef  ................
0000010: bbbf efbb bf3c 2144 4f43 5459 5045 2068  .....<!DOCTYPE h
0000020: 746d 6c3e 0a3c 6874 6d6c 3e0a 3c68 6561  tml>.<html>.<hea
...
查看更多
看风景的人
7楼-- · 2018-12-31 23:24

try:

// -------- read the file-content ----
$str = file_get_contents($source_file); 

// -------- remove the utf-8 BOM ----
$str = str_replace("\xEF\xBB\xBF",'',$str); 

// -------- get the Object from JSON ---- 
$obj = json_decode($str); 

:)

查看更多
登录 后发表回答