I am programmatically exporting data (using PHP 5.2) into a .csv test file.
Example data: Numéro 1
(note the accented e).
The data is utf-8
(no prepended BOM).
When I open this file in MS Excel is displays as Numéro 1
.
I am able to open this in a text editor (UltraEdit) which displays it correctly. UE reports the character is decimal 233
.
How can I export text data in a .csv file so that MS Excel will correctly render it, preferably without forcing the use of the import wizard, or non-default wizard settings?
Below is the PHP code I use in my project when sending Microsoft Excel to user:
UPDATED: Filename improvement and BUG fix correct length calculation. Thanks to TRiG and @ivanhoe011
Echo UTF-8 BOM before outputing CSV data. This fixes all character issues in Windows but doesnt work for Mac.
It works for me because I need to generate a file which will be used on Windows PCs only.
The CSV format is implemented as ASCII, not unicode, in Excel, thus mangling the diacritics. We experienced the same issue which is how I tracked down that the official CSV standard was defined as being ASCII-based in Excel.
UTF-8 doesn't work for me in office 2007 without any service pack, with or without BOM (U+ffef or 0xEF,0xBB,0xBF , neither works) installing sp3 makes UTF-8 work when 0xEF,0xBB,0xBF BOM is prepended.
UTF-16 works when encoding in python using "utf-16-le" with a 0xff 0xef BOM prepended, and using tab as seperator. I had to manually write out the BOM, and then use "utf-16-le" rather then "utf-16", otherwise each encode() prepended the BOM to every row written out which appeared as garbage on the first column of the second line and after.
can't tell whether UTF-16 would work without any sp installed, since I can't go back now. sigh
This is on windows, dunno about office for MAC.
for both working cases, the import works when launching a download directly from the browser and the text import wizard doesn't intervence, it works like you would expect.
This is just of a question of character encodings. It looks like you're exporting your data as UTF-8: é in UTF-8 is the two-byte sequence 0xC3 0xA9, which when interpreted in Windows-1252 is é. When you import your data into Excel, make sure to tell it that the character encoding you're using is UTF-8.
Another solution I found was just to encode the result as Windows Code Page 1252 (Windows-1252 or CP1252). This would be done, for example by setting
Content-Type
appropriately to something liketext/csv; charset=Windows-1252
and setting the character encoding of the response stream similarly.