Django directory upload get sub-directory names

2019-04-07 03:47发布

问题:

I am writing a django app to upload a directory of files with forms.

This is the form I am using which allows upload of directory:

class FileFieldForm(forms.Form):
    file_field = forms.FileField(widget=forms.ClearableFileInput(attrs=
        {'multiple': True, 'webkitdirectory': True, 'directory': True}))

This is the raw post payload:

------WebKitFormBoundaryPbO3HkrKGbBwgD3sd1
Content-Disposition: form-data; name="csrfmiddlewaretoken"

F575Bgl4U9dzgwePPeSW2ISZKk5c3CnRoqFasdasD0Hep6nD0LnAAObXbF92SUa96NbO2
------WebKitFormBoundaryPbO3HkrKGbBwgDsd31
Content-Disposition: form-data; name="file_field";
filename="MainDir/SubDir1/1.jpg"
Content-Type: image/jpeg


------WebKitFormBoundaryPbOasd3HkrKGbBwgD31
Content-Disposition: form-data; name="file_field";
filename="MainDir/SubDir2/2.jpg"
Content-Type: image/jpeg

This is the view to handle form:

class FileFieldView(FormView):
    form_class = FileFieldForm
    template_name = 'upload.html'
    success_url = 'upload'

    def post(self, request, *args, **kwargs):
        form_class = self.get_form_class()
        form = self.get_form(form_class)
        files = request.FILES.getlist('file_field')
        if form.is_valid():
            for f in files:
                pprint("Name of file is " + f._get_name() + ' ' + f.field_name, sys.stderr)
                new_file = FileModel(file=f)
                new_file.save()
            return self.form_valid(form)
        else:
            return self.form_invalid(form)

Problem is that name of file object in django is without sub-directory names. I am assuming one of the middleware handling request is parsing and removing subdirectory names from filename. Is there way I can get the original filename that has directory and sub-directory names?

回答1:

I believe this is how Django is implemented. Please refer to Django's Upload Handler doc.

It has its default upload handlers MemoryFileUploadHandler and TemporaryFileUploadHandler. Both of them are using the UploadedFile for handling the files, and it has a function _set_name, which takes the base name of the file.

Even there is a comment saying why it takes the basename:

def _set_name(self, name):
    # Sanitize the file name so that it can't be dangerous.
    if name is not None:
        # Just use the basename of the file -- anything else is dangerous.
        name = os.path.basename(name)

        # File names longer than 255 characters can cause problems on older OSes.
        if len(name) > 255:
            name, ext = os.path.splitext(name)
            ext = ext[:255]
            name = name[:255 - len(ext)] + ext

    self._name = name

But I think you can can write your own upload handler which doesn't take the basename and behaves as you want. Here is little info how you can write custom upload handler.

Then you need to define your handler in FILE_UPLOAD_HANDLERS setting.



回答2:

Expanding on the previous answer, one way of getting the full path from a directory upload is by replacing slashes (\ and /) in the file path (which get sanitized away) with hyphens:

class CustomMemoryFileUploadHandler(MemoryFileUploadHandler):
    def new_file(self, *args, **kwargs):
        args = (args[0], args[1].replace('/', '-').replace('\\', '-')) + args[2:]
        super(CustomMemoryFileUploadHandler, self).new_file(*args, **kwargs)

class CustomTemporaryFileUploadHandler(TemporaryFileUploadHandler):
    def new_file(self, *args, **kwargs):
        args = (args[0], args[1].replace('/', '-').replace('\\', '-')) + args[2:]
        super(CustomTemporaryFileUploadHandler, self).new_file(*args, **kwargs)

@csrf_exempt
def my_view(request):
    # replace upload handlers. This depends on FILE_UPLOAD_HANDLERS setting. Below code handles the default in Django 1.10
    request.upload_handlers = [CustomMemoryFileUploadHandler(request), CustomTemporaryFileUploadHandler(request)]
    return _my_view(request)

@csrf_protect
def _my_view(request):
    # if the path of the uploaded file was "test/abc.jpg", here it will be "test-abc.jpg"
    blah = request.FILES[0].name


回答3:

In addition to the previous answers there is another way which may be useful to someone. You can get the original filename as it is in the multipart/form-data payload without overriding handlers if you have only one file in the requests.

MemoryFileUploadHandler and TemporaryFileUploadHandler (which are used by default, see Django's docs: Built-in upload handlers) are inherited from the FileUploadHandler class. Such objects have the file_name variable (see Django's code). The full name of one of the files from the request is stored here (any one file, we can not say in advance). But if you always have one file in the request - this is the way.

So the view will look like:

def your_view(request):
    file = request.FILES.get('file_field')
    full_file_name = request.upload_handlers[0].file_name # e.g. 'MainDir/SubDir1/1.jpg'

For multiple file upload we can override handlers:

class NamedMemoryFileUploadHandler(MemoryFileUploadHandler):
    def file_complete(self, file_size):
        in_memory_file = super().file_complete(file_size)
        if in_memory_file is None:
            return
        return in_memory_file, self.file_name


class NamedTemporaryFileUploadHandler(TemporaryFileUploadHandler):
    def file_complete(self, file_size):
        temporary_file = super().file_complete(file_size)
        if temporary_file is None:
            return
        return temporary_file, self.file_name

@csrf_exempt
def upload_files(request):
    request.upload_handlers = [
        NamedMemoryFileUploadHandler(request),
        NamedTemporaryFileUploadHandler(request),
    ]
    return _upload_files(request)


@csrf_protect
def _upload_files(request):
    files = request.FILES.getlist("file") # list of tuples [(<file1>, "'MainDir/SubDir1/1.jpg'"), (<file2>, "'MainDir/SubDir2/2.jpg'")]
    for tmp_file, full_path in files:
        ...