Django ImageField validation (is it sufficient)?

2020-02-26 01:05发布

问题:

I have a lot of user uploaded content and I want to validate that uploaded image files are not, in fact, malicious scripts. In the Django documentation, it states that ImageField:

"Inherits all attributes and methods from FileField, but also validates that the uploaded object is a valid image."

Is that totally accurate? I've read that compressing or otherwise manipulating an image file is a good validation test. I'm assuming that PIL does something like this....

Will ImageField go a long way toward covering my image upload security?

回答1:

Django validates the image uploaded via form using PIL. See https://code.djangoproject.com/browser/django/trunk/django/forms/fields.py#L519

try:
    # load() is the only method that can spot a truncated JPEG,
    #  but it cannot be called sanely after verify()
    trial_image = Image.open(file)
    trial_image.load()

    # Since we're about to use the file again we have to reset the
    # file object if possible.
    if hasattr(file, 'reset'):
        file.reset()

    # verify() is the only method that can spot a corrupt PNG,
    #  but it must be called immediately after the constructor
    trial_image = Image.open(file)
    trial_image.verify()
 ...
 except Exception: # Python Imaging Library doesn't recognize it as an image
    raise ValidationError(self.error_messages['invalid_image'])

PIL documentation states the following about verify():

Attempts to determine if the file is broken, without actually decoding the image data. If this method finds any problems, it raises suitable exceptions. This method only works on a newly opened image; if the image has already been loaded, the result is undefined. Also, if you need to load the image after using this method, you must reopen the image file.

You should also note that ImageField is only validated when uploaded using form. If you save the model your self (e.g. using some kind of download script), the validation is not performed.



回答2:

Another test is with the file command. It checks for the presence of "magic numbers" in the file to determine its type. On my system, the file package includes libmagic as well as a ctypes-based wrapper /usr/lib64/python2.7/site-packages/magic.py. It looks like you use it like:

import magic

ms = magic.open(magic.MAGIC_NONE)
ms.load()
type =  ms.file("/path/to/some/file")
print type

f = file("/path/to/some/file", "r")
buffer = f.read(4096)
f.close()

type = ms.buffer(buffer)
print type

ms.close()

(Code from here.)


As to your original question: "Read the Source, Luke."

django/core/files/images.py:

"""
Utility functions for handling images.

Requires PIL, as you might imagine.
"""

from django.core.files import File

class ImageFile(File):
    """
    A mixin for use alongside django.core.files.base.File, which provides
    additional features for dealing with images.
    """
    def _get_width(self):
        return self._get_image_dimensions()[0]
    width = property(_get_width)

    def _get_height(self):
        return self._get_image_dimensions()[1]
    height = property(_get_height)

    def _get_image_dimensions(self):
        if not hasattr(self, '_dimensions_cache'):
            close = self.closed
            self.open()
            self._dimensions_cache = get_image_dimensions(self, close=close)
        return self._dimensions_cache

def get_image_dimensions(file_or_path, close=False):
    """
    Returns the (width, height) of an image, given an open file or a path.  Set
    'close' to True to close the file at the end if it is initially in an open
    state.
    """
    # Try to import PIL in either of the two ways it can end up installed.
    try:
        from PIL import ImageFile as PILImageFile
    except ImportError:
        import ImageFile as PILImageFile

    p = PILImageFile.Parser()
    if hasattr(file_or_path, 'read'):
        file = file_or_path
        file_pos = file.tell()
        file.seek(0)
    else:
        file = open(file_or_path, 'rb')
        close = True
    try:
        while 1:
            data = file.read(1024)
            if not data:
                break
            p.feed(data)
            if p.image:
                return p.image.size
        return None
    finally:
        if close:
            file.close()
        else:
            file.seek(file_pos)

So it looks like it just reads the file 1024 bytes at a time until PIL says it's an image, then stops. This obviously does not integrity-check the entire file, so it really depends on what you mean by "covering my image upload security": illicit data could be appended to an image and passed through your site. Someone could DOS your site by uploading a lot of junk or a really big file. You could be vulnerable to an injection attack if you don't check any uploaded captions or make assumptions about the image's uploaded filename. And so on.