Excel csv export into a php file with fgetcsv

2020-03-25 04:52发布

I'm using excel 2010 professional plus to create an excel file. Later on I'm trying to export it as a UTF-8 .csv file. I do this by saving it as CSV (symbol separated.....sry I know not the exact wording there but I don't have the english version and I fear it is translated differently than 1:1). There I click on tools->weboptions and select unicode (UTF-8) as encoding. The example .csv is as follows:

ID;englishName;germanName
1;Austria;Österreich

So far so good, but if I open the file now with my php code:

 header('Content-Type: text/html; charset=UTF-8');
 iconv_set_encoding("internal_encoding", "UTF-8");
 iconv_set_encoding("output_encoding", "UTF-8");
 setlocale(LC_ALL, 'de_DE.utf8');
 $fp=fopen($filePathName,'r');
 while (($dataRow= fgetcsv($fp,0,";",'"') )!==FALSE)
 {
     print_r($dataRow);
 }
  • I get: �sterreich as a result on the screen (as that is the "error" I cut all other parts of the result).
  • If I open the file with notedpad++ and look at the encoding I see "ANSI" instead of UTF-8.
  • If I change the encoding in notepad++ to UTF8....the ö,ä,... are replaced by special chars, which I have to correct manually.

If I go another route and create a new UTF-8 file with notedpad++ and put in the same data as in the excel file I get shown "Österreich" on screen when I open it with the php file.

Now the question I have is, why does it not function with excel, thus am I doing something wrong here? Or am I overlooking something?

Edit: As the program will in the end be installed on windows servers provided by customers, a solution is needed where it is not necessary to install additional tools (php libraries,... are ok, but having to install a vm-ware or cygwin,... is not). Also there won't be a excel (or office) locally installed on the server as the customer will be able to upload the .csv file via a file upload dialog (the dialog itself is not part of the problem, as I know how to handle those and additionally the problem itself I stumbled over when I created an excel file and converted it to .csv on a testmachine where excel was locally installed).

Tnx

标签: php excel csv
5条回答
萌系小妹纸
2楼-- · 2020-03-25 05:00

From what you say, I suspect excel writes an UTF-8 file without BOM, which makes guessing that the encoding is utf-8 slightly trickier. You can confirm this diagnostic if the characters appear correctly in Notepad++ when pressing to Format->Encode in UTF-8 (without BOM) (rather than Format->Convert to UTF-8 (without BOM)).

And are you sure every user is going to use UTF-8 ? Sounds to me that you need something that does a little smart guessing of what your real input encoding is. By "smart", I mean that this guessing recognizes BOM-less UTF-8.

To cut to the chase, I'd do something like that :

$f = fopen('file.csv', 'r');

while( ($row = fgets($f)) != null )
    if( mb_detect_encoding($row, 'UTF-8', true) !== false )
        var_dump(str_getcsv( $row, ';' ));
    else
        var_dump(str_getcsv( utf8_encode($row), ';' ));

fclose($f);

Which works because you read the characters to guess the encoding, rather than lazily trusting the first 3 characters : so UTF-8 without BOM would still be recognized as UTF-8. Of course if your csv file is not too big you could do that encoding detection on the whole file contents : something like mb_detect_encoding(file_get_contents(...), ...)

查看更多
Juvenile、少年°
3楼-- · 2020-03-25 05:13

The problem must be your file encoding, it looks it's not utf-8.

When I tried your example and double checked file that is indeed utf-8, it works for me, I get:

Array ( [0] => 1 [1] => Austria [2] => Österreich )

Use LibreOffice (OpenOffice), it's more reliable for these sort of things.

查看更多
干净又极端
4楼-- · 2020-03-25 05:15

From PHP DOC

Locale setting is taken into account by this function. If LANG is e.g. en_US.UTF-8, files in one-byte encoding are read wrong by this function.

You can try

header('Content-Type: text/html; charset=UTF-8');
$fp = fopen("log.txt", "r");
echo "<pre>";
while ( ($dataRow = fgetcsv($fp, 1000, ";")) !== FALSE ) {
    $dataRow = array_map("utf8_encode", $dataRow);
    print_r($dataRow);
}

Output

Array
(
    [0] => ID
    [1] => englishName
    [2] => germanName
)
Array
(
    [0] => 1
    [1] => Austria
    [2] => Österreich
)
查看更多
Lonely孤独者°
5楼-- · 2020-03-25 05:15

And for the people from Czech republic:

function convert( $str ) {
    return iconv( "CP1250", "UTF-8", $str );
}
...
while (($data = fgetcsv($this->fhandle, 1000, ";")) !== FALSE) {
$data = array_map( "convert", $data );
...
查看更多
孤傲高冷的网名
6楼-- · 2020-03-25 05:17

I don't know why Excel is generating a ANSI file instead of UTF-8 (as you can see in Notepad++), but if this is the case, you can convert the file using iconv:

iconv --from-code=ISO-8859-1 --to-code=UTF-8 my_csv_file.csv > my_csv_file_utf8.csv

查看更多
登录 后发表回答