I have a directory which contains several files, many of which has non-english name. I am using PHP in Windows 7.
I want to list the filename and their content using PHP.
Currently I am using DirectoryIterator
and file_get_contents
. This works for English files names but not for non-English (chinese) file names.
For example, I have filenames like "एक और प्रोब्लेम.eml", "hello 鶨鶖鵨鶣鎹鎣.eml".
DirectoryIterator
is not able to get the filename using->getFilename()
file_get_contents
is also not able to open even if I hard code the filename in its parameter.
How can I do it?
Do discover the files I have this script:
This will succesfully find the file: 鶨鶖鵨鶣鎹鎣 I tried it here on a Linux distro though..
to read it you use: Line by line:
Short reply:
Under Windows, you cannot access arbitrary file names with PHP; you are limited to those file names whose name can be represented with the currently selected "code page" (see Regional and Language Options", "Format" panel and "Administrative" tab panel "Language for non-Unicode programs").
Longer reply:
Windows uses UTF-16 for file encoding since Win2000, but PHP communicate with the underlying file system as a "non-Unicode aware program". This means that there is a current "code page table" that tranlates from PHP strings to UTF-16 strings and vice-versa. From PHP the current code page can be retrieved by setlocale() in the form "language_country.codepage", for example:
setlocale(LC_CTYPE, 0) ==> "english_United States.1252"
where 1252 is the Windows code page table currently selected from the control panel; file names retrieved from the file system are encoded using that code page; file names generated from PHP must be encoded according to that code page. Things are even more complicated by the fact that UTF-16 file names are traslated to PHP strings using the "best-fit code page", that is an approxymated representation of the actual characters/words, so you cannot trust on file names and paths retrieved from the file system as they might be arbitrarily mangled.
References:
http://en.wikipedia.org/wiki/Windows_code_page What "Windows code pages" are.
https://bugs.php.net/bug.php?id=47096 More details about this issue.
This is not possible. It's a limitation of PHP. PHP uses the multibyte versions of Windows APIs; you're limited to the characters your codepage can represent.
See this answer.
Directory contents:
Test file contents:
Test file results:
Debugger output:
Call stack (PHP 5.3.0):
Is it really a question mark?
Yes! It's character #63.