I'm programming something that allows users to store documents and pictures on a webserver, to be stored and retrieved later. When users upload files to my server, PHP tells me what filetype it is based on the extension. However, I'm afraid that users could rename a zip file as somezipfile.png and store it, thus keeping a zip file on my server. Is there any reasonable way to open an uploaded file and "check" to see if it truly is of the said filetype?
问题:
回答1:
Magic number. If you can read first few bytes of a binary file you can know what kind of file it is.
回答2:
Check out the FileInfo PECL extension for PHP, which can do the MIME magic lookups for you.
回答3:
Sort of. Most file types have some bytes reserved for marking them so that you don't have to rely on the extension. The site http://wotsit.org is a great resource for finding this out for a particular type.
If you are on a unix system, I believe that the file command doesn't rely on the extension, so you could shell out to it if you don't want to write the byte checking code.
For PNG (http://www.w3.org/TR/PNG-Rationale.html)
The first eight bytes of a PNG file always contain the following values:
(decimal) 137 80 78 71 13 10 26 10
(hexadecimal) 89 50 4e 47 0d 0a 1a 0a
(ASCII C notation) \211 P N G \r \n \032 \n
回答4:
If you are only dealing with images, then getimagesize() should distinguish a valid image from a fake one.
$ php -r 'var_dump(getimagesize("b&n.jpg"));'
array(7) {
[0]=>
int(200)
[1]=>
int(200)
[2]=>
int(2)
[3]=>
string(24) "width="200" height="200""
["bits"]=>
int(8)
["channels"]=>
int(3)
["mime"]=>
string(10) "image/jpeg"
}
$ php -r 'var_dump(getimagesize("/etc/passwd"));'
bool(false)
A false value from getimagesize is not an image.
回答5:
Many filetypes have "magic numbers" at the beginning of the file to identify them, You can read some bytes from the front of the file and compare them to a list of known magic numbers.
回答6:
On a unix system, capturing the output from the 'file' command should provide adequate info.
回答7:
For an exact answer on how you could quickly do this in PHP, check out this question: How do I find the mime-type of a file with php?
回答8:
As a side note I ran into a similar problem where I had to do my own type checking. The front end interface to my application was done in flash. The files were being passed through flash to a php script. When I was attempting to do a MIME type check using php the type always returned was application/octetstream because it was coming from flash.
I had to implement a magic numbers type paradigm. I simply created an xml file that held the file type along with some defining patterns found within the beginning of the file. Once the file reached the server I did some pattern matching with the xml file and then accepted or rejected the file. I didn't noticed any real performance decrease either which I was expecting.
This is just a side note to anyone who may be using flash as there front end and trying to type check the file once it is uploaded.
回答9:
As well as identifying the filetype, you might want to watch out for files with other files embedded or appended to them. This will unfortunately require a more indepth analysis of the file contents than just using "magic numbers".
For example, http://quantumrook.wordpress.com/2007/06/06/hide-a-rar-file-in-a-jpg-file/ (this particular type of data hiding can be easily worked around by loading and resaving into a new file the actual image data .. others will be more difficult.)