I'm using excel 2010 professional plus to create an excel file. Later on I'm trying to export it as a UTF-8 .csv file. I do this by saving it as CSV (symbol separated.....sry I know not the exact wording there but I don't have the english version and I fear it is translated differently than 1:1). There I click on tools->weboptions and select unicode (UTF-8) as encoding. The example .csv is as follows:
ID;englishName;germanName
1;Austria;Österreich
So far so good, but if I open the file now with my php code:
header('Content-Type: text/html; charset=UTF-8');
iconv_set_encoding("internal_encoding", "UTF-8");
iconv_set_encoding("output_encoding", "UTF-8");
setlocale(LC_ALL, 'de_DE.utf8');
$fp=fopen($filePathName,'r');
while (($dataRow= fgetcsv($fp,0,";",'"') )!==FALSE)
{
print_r($dataRow);
}
- I get: �sterreich as a result on the screen (as that is the "error" I cut all other parts of the result).
- If I open the file with notedpad++ and look at the encoding I see "ANSI" instead of UTF-8.
- If I change the encoding in notepad++ to UTF8....the ö,ä,... are replaced by special chars, which I have to correct manually.
If I go another route and create a new UTF-8 file with notedpad++ and put in the same data as in the excel file I get shown "Österreich" on screen when I open it with the php file.
Now the question I have is, why does it not function with excel, thus am I doing something wrong here? Or am I overlooking something?
Edit: As the program will in the end be installed on windows servers provided by customers, a solution is needed where it is not necessary to install additional tools (php libraries,... are ok, but having to install a vm-ware or cygwin,... is not). Also there won't be a excel (or office) locally installed on the server as the customer will be able to upload the .csv file via a file upload dialog (the dialog itself is not part of the problem, as I know how to handle those and additionally the problem itself I stumbled over when I created an excel file and converted it to .csv on a testmachine where excel was locally installed).
Tnx
From what you say, I suspect excel writes an UTF-8 file without BOM, which makes guessing that the encoding is utf-8 slightly trickier. You can confirm this diagnostic if the characters appear correctly in Notepad++ when pressing to
Format->Encode in UTF-8 (without BOM)
(rather thanFormat->Convert to UTF-8 (without BOM)
).And are you sure every user is going to use UTF-8 ? Sounds to me that you need something that does a little smart guessing of what your real input encoding is. By "smart", I mean that this guessing recognizes BOM-less UTF-8.
To cut to the chase, I'd do something like that :
Which works because you read the characters to guess the encoding, rather than lazily trusting the first 3 characters : so UTF-8 without BOM would still be recognized as UTF-8. Of course if your csv file is not too big you could do that encoding detection on the whole file contents : something like
mb_detect_encoding(file_get_contents(...), ...)
The problem must be your file encoding, it looks it's not utf-8.
When I tried your example and double checked file that is indeed utf-8, it works for me, I get:
Use LibreOffice (OpenOffice), it's more reliable for these sort of things.
From PHP DOC
You can try
Output
And for the people from Czech republic:
I don't know why Excel is generating a ANSI file instead of UTF-8 (as you can see in Notepad++), but if this is the case, you can convert the file using iconv:
iconv --from-code=ISO-8859-1 --to-code=UTF-8 my_csv_file.csv > my_csv_file_utf8.csv