PHP fread() Function Returning Extra Characters at

2019-08-14 08:45发布

While I'm using fread() on a normal text file (for example: ANSI file saved normally with Notepad), the returned content string is correct, as everyone knows.

But when I read the UTF-8 text file, the returning content string contains invisible characters (at the front). Why I said invisible is that the extra characters can't be seen normally on output (e.g.. echo for just read). But when the content string is used for processing (for example: Build a link with href value), problem is arisen then.

$filename = "blabla.txt";
$handle = fopen($filename, "r");
$contents = fread($handle, filesize($filename));
fclose($handle);
echo '<a href="'.$contents.'">'.$contents.'</a>';

I put only http://www.google.com in the UTF-8 encoding text file. While running the PHP file, you will see a output link http://www.google.com
.. but you will never reach to Google.

Because address source href is being like this:

%EF%BB%BFhttp://www.google.com

It means, fread added %EF%BB%BF weird characters at the front.

This is extra annoying stuff. Why it is happening?

Added:
Some pointing that is BOM. So, BOM or whatever, it is changing my original values. So now, it is problem with other steps, function calls, etc. Now I have to substr($string,3) for all outputs. This is totally non-sense changing the original values.

2条回答
smile是对你的礼貌
2楼-- · 2019-08-14 09:11

This is called the UTF-8 BOM. Please refer to http://en.wikipedia.org/wiki/Byte_order_mark

It is something that is optionally added to the beginnning of Utf-8 files, meaning it is in the file, and not something fread adds. Most text editors won't display the BOM, but some will -- mostly those that don't understand it. Not all editors will add it to Utf-8 files, but yet again, some will...

For Utf-8 the usage of BOM is not recommended, as it has no meaning and by many instances are not understood.

查看更多
Melony?
3楼-- · 2019-08-14 09:18

It is UTF-8 BOM. IF you look at the docs for fread(here) someone has discussed a solution for it.

The solution given over there is the following

// Reads past the UTF-8 bom if it is there.
function fopen_utf8 ($filename, $mode) {
    $file = @fopen($filename, $mode);
    $bom = fread($file, 3);
    if ($bom != b"\xEF\xBB\xBF")
        rewind($file, 0);
    else
        echo "bom found!\n";
    return $file;
} 
查看更多
登录 后发表回答