uWSGI for uploading and processing files

2019-02-11 02:31发布

I have a python web application written in bottlepy. Its only purpose is to allow people to upload large files that will be processed (takes approximately 10-15 minutes to process).

The upload code i rather simple:

@route('/upload', method='POST')
def upload_file():
  uploadfile = request.files.get('fileToUpload')
  if not uploadfile:
    abort(500, 'No file selected for upload')

  name,ext = os.path.splitext(uploadfile.filename)

  if ext not in ['.zip','.gz']:
    abort(500, 'File extension not allowed')

  try:
    uploadfile.save('./files')

    process_file(uploadfile.filename) #this function is not yet implemented

    return "uploaded file '%s' for processing" % uploadfile.filename
  except IOError as e:
    abort(409, "File already exists.")

I plan to deploy this application using uWSGI (however, if other technology is better for the purpose its not set in stone.

Because of this I have some questions regarding the use of uWSGI for such a purpose:

  1. If the file upload takes minutes, how will uWSGI be capable of handling other clients without blocking?
  2. Is there any way the processing can be offloaded using built in functionality in uWSGI so that the user get a response after upload and can query for processing status?

Thank you for any help.

1条回答
三岁会撩人
2楼-- · 2019-02-11 02:41

If the file upload takes minutes, how will uWSGI be capable of handling other clients without blocking?

It will block. A solution is to put a webserver like NGINX in front of uWSGI that pre-buffers the POST request. So the file upload will be actually bound to an NGINX handler until is completed and then passed to the uWSGI handler.

Is there any way the processing can be offloaded using built in functionality in uWSGI so that the user get a response after upload and can query for processing status?

You need to create a task queue system to offload the processing from the web handler. This is a common best practice. Just look around for python task queues. For builtin functionalities it really depends on the task you need to offload. You can use the builtin uWSGI spooler, or the uWSGI mules. These are very good alternatives to a typical task queue (like the very famous Celery) but have limitations. Just try it yourself in your scenario.

查看更多
登录 后发表回答