I am using jqueryFileTree to show a directory listing on the server with download links to the files in the directory. Recently I've run into an issue with files which contain special characters:
- test.pdf : works fine
- tést.pdf : does not work (notice the é - acute accent - in the filename)
When debugging the php connector of jqueryFileTree, I see it's doing a scandir() of the directory passed via $_GET, and then looping over each file/dir of the directory. Before parsing the filename into the url, the script seems to correctly perform a htmlentities() over the file name. The problem seems to be that this htmlentities($file) call just returns an empty string, which according to the php docs this can be the case when the input string contains an invalid code unit within the given encoding. However i tried passing the charset implicitly by calling:
$file = htmlentities($file,ENT_QUOTES,'UTF-8');
But this also returns an empty string.
If I call: $file = htmlentities($file,ENT_IGNORE,'UTF-8'); The e acute character is just dropped (so tést.pdf becomes tst.pdf)
When debugging my php script with xdebug I can see the source string contains an unknown character (looks like this).
So I'm quite at my wits end here to find the solution for this. Any help would be welcome.
FYI:
- The charset of my page is UTF-8 (specified in metadata)
- The file is stored on a windows 2003 fileserver and scandir() is executed with the UNC path (e.g. //fileserver/sharename/sourcedir)
- The default encoding in my php.ini is set to UTF-8
- The webserver & PHP 5.4.26 are running on a windows 2008 R2 server
My best guess is that the filename itself isn't using UTF-8. Or at least
scandir()
isn't picking it up like that.Maybe
mb_detect_encoding()
can shed some light?If not, try to guess the encoding (CP1252 or ISO-8859-1 would be my first guess) and convert it to UTF-8, see if the output is valid:
Or using
iconv()
:Then when you've figured out which encoding is actually used, your code should look somewhat like this (assuming CP1252):