How does cgi.FieldStorage store files?

2019-02-09 10:47发布

问题:

So I've been playing around with raw WSGI, cgi.FieldStorage and file uploads. And I just can't understand how it deals with file uploads.

At first it seemed that it just stores the whole file in memory. And I thought hm, that should be easy to test - a big file should clog up the memory!.. And it didn't. Still, when I request the file, it's a string, not an iterator, file object or anything.

I've tried reading the cgi module's source and found some things about temporary files, but it returns a freaking string, not a file(-like) object! So... how does it fscking work?!

Here's the code I've used:

import cgi
from wsgiref.simple_server import make_server

def app(environ,start_response):
    start_response('200 OK',[('Content-Type','text/html')])
    output = """
    <form action="" method="post" enctype="multipart/form-data">
    <input type="file" name="failas" />
    <input type="submit" value="Varom" />
    </form>
    """
    fs = cgi.FieldStorage(fp=environ['wsgi.input'],environ=environ)
    f = fs.getfirst('failas')
    print type(f)
    return output


if __name__ == '__main__' :
    httpd = make_server('',8000,app)
    print 'Serving'
    httpd.serve_forever()

Thanks in advance! :)

回答1:

Inspecting the cgi module description, there is a paragraph discussing how to handle file uploads.

If a field represents an uploaded file, accessing the value via the value attribute or the getvalue() method reads the entire file in memory as a string. This may not be what you want. You can test for an uploaded file by testing either the filename attribute or the file attribute. You can then read the data at leisure from the file attribute:

fileitem = form["userfile"]
if fileitem.file:
    # It's an uploaded file; count lines
    linecount = 0
    while 1:
        line = fileitem.file.readline()
        if not line: break
        linecount = linecount + 1

Regarding your example, getfirst() is just a version of getvalue(). try replacing

f = fs.getfirst('failas')

with

f = fs['failas'].file

This will return a file-like object that is readable "at leisure".



回答2:

The best way is to NOT to read file (or even each line at a time as gimel suggested).

You can use some inheritance and extend a class from FieldStorage and then override make_file function. make_file is called when FieldStorage is of type file.

For your reference, default make_file looks like this:

def make_file(self, binary=None):
    """Overridable: return a readable & writable file.

    The file will be used as follows:
    - data is written to it
    - seek(0)
    - data is read from it

    The 'binary' argument is unused -- the file is always opened
    in binary mode.

    This version opens a temporary file for reading and writing,
    and immediately deletes (unlinks) it.  The trick (on Unix!) is
    that the file can still be used, but it can't be opened by
    another process, and it will automatically be deleted when it
    is closed or when the current process terminates.

    If you want a more permanent file, you derive a class which
    overrides this method.  If you want a visible temporary file
    that is nevertheless automatically deleted when the script
    terminates, try defining a __del__ method in a derived class
    which unlinks the temporary files you have created.

    """
    import tempfile
    return tempfile.TemporaryFile("w+b")

rather then creating temporaryfile, permanently create file wherever you want.



回答3:

Using an answer by @hasanatkazmi (utilized in a Twisted app) I got something like:

#!/usr/bin/env python2
# -*- coding: utf-8 -*-
# -*- indent: 4 spc -*-
import sys
import cgi
import tempfile


class PredictableStorage(cgi.FieldStorage):
    def __init__(self, *args, **kwargs):
        self.path = kwargs.pop('path', None)
        cgi.FieldStorage.__init__(self, *args, **kwargs)

    def make_file(self, binary=None):
        if not self.path:
            file = tempfile.NamedTemporaryFile("w+b", delete=False)
            self.path = file.name
            return file
        return open(self.path, 'w+b')

Be warned, that the file is not always created by the cgi module. According to these cgi.py lines it will only be created if the content exceeds 1000 bytes:

if self.__file.tell() + len(line) > 1000:
    self.file = self.make_file('')

So, you have to check if the file was actually created with a query to a custom class' path field like so:

if file_field.path:
    # Using an already created file...
else:
    # Creating a temporary named file to store the content.
    import tempfile
    with tempfile.NamedTemporaryFile("w+b", delete=False) as f:
        f.write(file_field.value)
        # You can save the 'f.name' field for later usage.

If the Content-Length is also set for the field, which seems rarely, the file should also be created by cgi.

That's it. This way you can store the file predictably, decreasing the memory usage footprint of your app.



标签: python cgi wsgi