While I'm using fread()
on a normal text file (for example: ANSI file saved normally with Notepad), the returned content string is correct, as everyone knows.
But when I read the UTF-8 text file, the returning content string contains invisible characters (at the front). Why I said invisible is that the extra characters can't be seen normally on output (e.g.. echo
for just read). But when the content string is used for processing (for example: Build a link with href
value), problem is arisen then.
$filename = "blabla.txt";
$handle = fopen($filename, "r");
$contents = fread($handle, filesize($filename));
fclose($handle);
echo '<a href="'.$contents.'">'.$contents.'</a>';
I put only http://www.google.com
in the UTF-8 encoding text file. While running the PHP file, you will see a output link http://www.google.com
.. but you will never reach to Google.
Because address source href
is being like this:
%EF%BB%BFhttp://www.google.com
It means, fread
added %EF%BB%BF
weird characters at the front.
This is extra annoying stuff. Why it is happening?
Added:
Some pointing that is BOM. So, BOM or whatever, it is changing my original values. So now, it is problem with other steps, function calls, etc. Now I have to substr($string,3)
for all outputs. This is totally non-sense changing the original values.
This is called the UTF-8 BOM. Please refer to http://en.wikipedia.org/wiki/Byte_order_mark
It is something that is optionally added to the beginnning of Utf-8 files, meaning it is in the file, and not something fread adds. Most text editors won't display the BOM, but some will -- mostly those that don't understand it. Not all editors will add it to Utf-8 files, but yet again, some will...
For Utf-8 the usage of BOM is not recommended, as it has no meaning and by many instances are not understood.
It is UTF-8 BOM. IF you look at the docs for fread(here) someone has discussed a solution for it.
The solution given over there is the following