PHPExcel: How to check whether a XLS file is valid

2020-02-05 11:23发布

问题:

I'm using PHPExcel 1.7.8 to read .xls files, uploaded by a radom user. All is working properly with a valid .xls file, but now I wanted to make some tests with invalid files to check if the program displays good error messages.

So I took a .csv file, and renamed it with .xls (without converting anything, just changing the name) to the end, just to check...
Broken! :)

DOM ELEMENT: HTML
DOM ELEMENT: BODY
DOM ELEMENT: P
START OF PARAGRAPH: 
END OF PARAGRAPH:
FLUSH CELL: A1 => block,date,hour...

array
  1 =>
    array
      'A' => string  'block,date,hour...' (length=2777)

{"step":"error","errors":[],"warnings":[]}

Like you can see, there's an error message displaying, I didn't ask for that, and then the JSON that I usually write.

It happens on this line :

<?php
echo "Loading file\n";
try {
    if (!($objPHPExcel = PHPExcel_IOFactory::load('path'))) {
        echo "Failed\n";
        return;
        // ...
    }
} catch(Exception $e) {
    echo 'Exception !';
}
echo "Done\n";

And this code displays:

Loading file
/!\ ERROR MESSAGE ABOVE /!\
Done

My question is, is there a way with PHPExcel or anything else to check whether a file is a valid XLS file before I try to parse it?

Thank you.

回答1:

Each reader in PHPExcel has a canRead() method that validates the file passed in to the read is of the appropriate format for that reader - the method returns a simple boolean True or False. A return of True from a call to the canRead() method of the PHPExcel_Reader_Excel5 class will confirm that the file can be read by that reader, irrespective of the file extension.

The IOFactory identify() method uses this call, testing against the Readers for each supported format in turn until it gets a true return from the canRead() call. The IOFactory load() method, in it's turn, uses identify() to determine which Reader should be used for the specified file.

The ability to verify a filetype (without depending on the file extension which can often be misleading) is particularly useful when you want to set additional arguments for the reader.

The fallback from identify()/load() is slightly less satisfactory: if canRead() returns false for all other Readers, then the file is treated as a CSV.



回答2:

Even if it's more than a year question, I still find it cumbersome to figure it out how to deal with this issue, I'll try to post my answer here.

If using try/catch block doesn't work (in my case, I renamed a jpg file to xls and the error handler doesn't work, instead of throwing error, invalid file just throws a warning), you can consider a manual checking using canRead() as Mark said, here's an example of how to use this function.

If you know what your filetypes are, you can define it manually and check against them:

$valid = false;
$types = array('Excel2007', 'Excel5');
foreach ($types as $type) {
    $reader = PHPExcel_IOFactory::createReader($type);
    if ($reader->canRead($file_path)) {
        $valid = true;
        break;
    }
}

if ($valid) {
  // TODO: load file
  // e.g. PHPExcel_IOFactory::load($file_path)
} else {
  // TODO: show error message
}

Hope this help anyone with the same problem.



标签: php phpexcel