Microsoft Excel mangles Diacritics in .csv files?

2018-12-31 07:38发布

I am programmatically exporting data (using PHP 5.2) into a .csv test file.
Example data: Numéro 1 (note the accented e). The data is utf-8 (no prepended BOM).

When I open this file in MS Excel is displays as Numéro 1.

I am able to open this in a text editor (UltraEdit) which displays it correctly. UE reports the character is decimal 233.

How can I export text data in a .csv file so that MS Excel will correctly render it, preferably without forcing the use of the import wizard, or non-default wizard settings?

22条回答
柔情千种
2楼-- · 2018-12-31 07:47

Below is the PHP code I use in my project when sending Microsoft Excel to user:

  /**
   * Export an array as downladable Excel CSV
   * @param array   $header
   * @param array   $data
   * @param string  $filename
   */
  function toCSV($header, $data, $filename) {
    $sep  = "\t";
    $eol  = "\n";
    $csv  =  count($header) ? '"'. implode('"'.$sep.'"', $header).'"'.$eol : '';
    foreach($data as $line) {
      $csv .= '"'. implode('"'.$sep.'"', $line).'"'.$eol;
    }
    $encoded_csv = mb_convert_encoding($csv, 'UTF-16LE', 'UTF-8');
    header('Content-Description: File Transfer');
    header('Content-Type: application/vnd.ms-excel');
    header('Content-Disposition: attachment; filename="'.$filename.'.csv"');
    header('Content-Transfer-Encoding: binary');
    header('Expires: 0');
    header('Cache-Control: must-revalidate, post-check=0, pre-check=0');
    header('Pragma: public');
    header('Content-Length: '. strlen($encoded_csv));
    echo chr(255) . chr(254) . $encoded_csv;
    exit;
  }

UPDATED: Filename improvement and BUG fix correct length calculation. Thanks to TRiG and @ivanhoe011

查看更多
妖精总统
3楼-- · 2018-12-31 07:47

Echo UTF-8 BOM before outputing CSV data. This fixes all character issues in Windows but doesnt work for Mac.

echo "\xEF\xBB\xBF";

It works for me because I need to generate a file which will be used on Windows PCs only.

查看更多
素衣白纱
4楼-- · 2018-12-31 07:49

The CSV format is implemented as ASCII, not unicode, in Excel, thus mangling the diacritics. We experienced the same issue which is how I tracked down that the official CSV standard was defined as being ASCII-based in Excel.

查看更多
长期被迫恋爱
5楼-- · 2018-12-31 07:52

UTF-8 doesn't work for me in office 2007 without any service pack, with or without BOM (U+ffef or 0xEF,0xBB,0xBF , neither works) installing sp3 makes UTF-8 work when 0xEF,0xBB,0xBF BOM is prepended.

UTF-16 works when encoding in python using "utf-16-le" with a 0xff 0xef BOM prepended, and using tab as seperator. I had to manually write out the BOM, and then use "utf-16-le" rather then "utf-16", otherwise each encode() prepended the BOM to every row written out which appeared as garbage on the first column of the second line and after.

can't tell whether UTF-16 would work without any sp installed, since I can't go back now. sigh

This is on windows, dunno about office for MAC.

for both working cases, the import works when launching a download directly from the browser and the text import wizard doesn't intervence, it works like you would expect.

查看更多
裙下三千臣
6楼-- · 2018-12-31 07:52

This is just of a question of character encodings. It looks like you're exporting your data as UTF-8: é in UTF-8 is the two-byte sequence 0xC3 0xA9, which when interpreted in Windows-1252 is é. When you import your data into Excel, make sure to tell it that the character encoding you're using is UTF-8.

查看更多
十年一品温如言
7楼-- · 2018-12-31 07:52

Another solution I found was just to encode the result as Windows Code Page 1252 (Windows-1252 or CP1252). This would be done, for example by setting Content-Type appropriately to something like text/csv; charset=Windows-1252 and setting the character encoding of the response stream similarly.

查看更多
登录 后发表回答