How do I tell if someone's faking a filetype?

2020-06-03 01:16发布

问题:

I'm programming something that allows users to store documents and pictures on a webserver, to be stored and retrieved later. When users upload files to my server, PHP tells me what filetype it is based on the extension. However, I'm afraid that users could rename a zip file as somezipfile.png and store it, thus keeping a zip file on my server. Is there any reasonable way to open an uploaded file and "check" to see if it truly is of the said filetype?

回答1:

Magic number. If you can read first few bytes of a binary file you can know what kind of file it is.



回答2:

Check out the FileInfo PECL extension for PHP, which can do the MIME magic lookups for you.



回答3:

Sort of. Most file types have some bytes reserved for marking them so that you don't have to rely on the extension. The site http://wotsit.org is a great resource for finding this out for a particular type.

If you are on a unix system, I believe that the file command doesn't rely on the extension, so you could shell out to it if you don't want to write the byte checking code.

For PNG (http://www.w3.org/TR/PNG-Rationale.html)

The first eight bytes of a PNG file always contain the following values:

(decimal) 137 80 78 71 13 10 26 10

(hexadecimal) 89 50 4e 47 0d 0a 1a 0a

(ASCII C notation) \211 P N G \r \n \032 \n



回答4:

If you are only dealing with images, then getimagesize() should distinguish a valid image from a fake one.

$ php -r 'var_dump(getimagesize("b&n.jpg"));'
array(7) {
  [0]=>
  int(200)
  [1]=>
  int(200)
  [2]=>
  int(2)
  [3]=>
  string(24) "width="200" height="200""
  ["bits"]=>
  int(8)
  ["channels"]=>
  int(3)
  ["mime"]=>
  string(10) "image/jpeg"
}

$ php -r 'var_dump(getimagesize("/etc/passwd"));'
bool(false)

A false value from getimagesize is not an image.



回答5:

Many filetypes have "magic numbers" at the beginning of the file to identify them, You can read some bytes from the front of the file and compare them to a list of known magic numbers.



回答6:

On a unix system, capturing the output from the 'file' command should provide adequate info.



回答7:

For an exact answer on how you could quickly do this in PHP, check out this question: How do I find the mime-type of a file with php?



回答8:

As a side note I ran into a similar problem where I had to do my own type checking. The front end interface to my application was done in flash. The files were being passed through flash to a php script. When I was attempting to do a MIME type check using php the type always returned was application/octetstream because it was coming from flash.

I had to implement a magic numbers type paradigm. I simply created an xml file that held the file type along with some defining patterns found within the beginning of the file. Once the file reached the server I did some pattern matching with the xml file and then accepted or rejected the file. I didn't noticed any real performance decrease either which I was expecting.

This is just a side note to anyone who may be using flash as there front end and trying to type check the file once it is uploaded.



回答9:

As well as identifying the filetype, you might want to watch out for files with other files embedded or appended to them. This will unfortunately require a more indepth analysis of the file contents than just using "magic numbers".

For example, http://quantumrook.wordpress.com/2007/06/06/hide-a-rar-file-in-a-jpg-file/ (this particular type of data hiding can be easily worked around by loading and resaving into a new file the actual image data .. others will be more difficult.)