DOCX File type in PHP finfo_file is application/zi

2019-01-18 03:36发布

问题:

hello I'm trying to validate an uploaded file type by finfo_file function.

But when a .docx file is sent, the file type is:

application/zip

instead of:

application/vnd.openxmlformats-officedocument.wordprocessingml.document

how can I change this behavior?

回答1:

As far as I now the vendor specific file types (vnd.) are not standardized (by any RFC) and therefore not covered by file_info(). .docx is a zipped xml-format and thats the reason, why file_info() returns application_zip (what is completely right). You may unzip the file and test the mime-type of the result, but that will lead to xml (what is completely correct too) and other files, that are used by the document. To differ between different XML formats file_info() had to analyze its content and it must know, how it looks, what goes just to far.



回答2:

This works on debian. Add this to /etc/magic:

#------------------------------------------------------------------------------
# $File: msooxml,v 1.1 2011/01/25 18:36:19 christos Exp $
# msooxml:  file(1) magic for Microsoft Office XML
# From: Ralf Brown <ralf.brown@gmail.com>

# .docx, .pptx, and .xlsx are XML plus other files inside a ZIP
#   archive.  The first member file is normally "[Content_Types].xml".
# Since MSOOXML doesn't have anything like the uncompressed "mimetype"
#   file of ePub or OpenDocument, we'll have to scan for a filename
#   which can distinguish between the three types

# start by checking for ZIP local file header signature
0               string          PK\003\004
# make sure the first file is correct
>0x1E           string          [Content_Types].xml
# skip to the second local file header
#   since some documents include a 520-byte extra field following the file
#   header,  we need to scan for the next header
>>(18.l+49)     search/2000     PK\003\004
# now skip to the *third* local file header; again, we need to scan due to a
#   520-byte extra field following the file header
>>>&26          search/1000     PK\003\004
# and check the subdirectory name to determine which type of OOXML
#   file we have
>>>>&26         string          word/           Microsoft Word 2007+
!:mime application/msword
>>>>&26         string          ppt/            Microsoft PowerPoint 2007+
!:mime application/vnd.ms-powerpoint
>>>>&26         string          xl/             Microsoft Excel 2007+
!:mime application/vnd.ms-excel
>>>>&26         default         x               Microsoft OOXML
!:strength +10

Then, tell php to use /etc/magic as it's database:

$finfo = finfo_open(FILEINFO_MIME,"/etc/magic");


回答3:

This is because a DOCX is a ZIP file:

An Office Open XML file is a ZIP-compatible OPC package containing XML documents and other resources.

Like Open Office files, the documents are ZIPs containing various resources in a structured and well-defined manner. So when you try to identify the file content, you first see that it is a ZIP file. You would then need to look inside the ZIP to decide whether it's a DOCX or OpenOffice file.

As an alternative, you could have a look at the file extension: if you identify the file to be a ZIP and the extension happens to be .doc or .docx then you can assume it to be an OOXML file.



回答4:

On apache in .htaccess add this, to fix the docx and all the other file types issues:

AddType application/vnd.ms-word.document.macroEnabled.12 .docm
AddType application/vnd.openxmlformats-officedocument.wordprocessingml.document docx
AddType application/vnd.openxmlformats-officedocument.wordprocessingml.template dotx
AddType application/vnd.ms-powerpoint.template.macroEnabled.12 potm
AddType application/vnd.openxmlformats-officedocument.presentationml.template potx
AddType application/vnd.ms-powerpoint.addin.macroEnabled.12 ppam
AddType application/vnd.ms-powerpoint.slideshow.macroEnabled.12 ppsm
AddType application/vnd.openxmlformats-officedocument.presentationml.slideshow ppsx
AddType application/vnd.ms-powerpoint.presentation.macroEnabled.12 pptm
AddType application/vnd.openxmlformats-officedocument.presentationml.presentation pptx
AddType application/vnd.ms-excel.addin.macroEnabled.12 xlam
AddType application/vnd.ms-excel.sheet.binary.macroEnabled.12 xlsb
AddType application/vnd.ms-excel.sheet.macroEnabled.12 xlsm
AddType application/vnd.openxmlformats-officedocument.spreadsheetml.sheet xlsx
AddType application/vnd.ms-excel.template.macroEnabled.12 xltm
AddType application/vnd.openxmlformats-officedocument.spreadsheetml.template xltx


回答5:

We had the same problem with PHP 5.3. It works fine under PHP 7.2. I have application/vnd.openxmlformats-officedocument.wordprocessingml.document for my docx file.

To ensure that you have a docx file under PHP 5.3, you check the mime type from the [Content_Types].xml file in the archive (docx).