I have a python web application written in bottlepy. Its only purpose is to allow people to upload large files that will be processed (takes approximately 10-15 minutes to process).
The upload code i rather simple:
@route('/upload', method='POST')
def upload_file():
uploadfile = request.files.get('fileToUpload')
if not uploadfile:
abort(500, 'No file selected for upload')
name,ext = os.path.splitext(uploadfile.filename)
if ext not in ['.zip','.gz']:
abort(500, 'File extension not allowed')
try:
uploadfile.save('./files')
process_file(uploadfile.filename) #this function is not yet implemented
return "uploaded file '%s' for processing" % uploadfile.filename
except IOError as e:
abort(409, "File already exists.")
I plan to deploy this application using uWSGI (however, if other technology is better for the purpose its not set in stone.
Because of this I have some questions regarding the use of uWSGI for such a purpose:
- If the file upload takes minutes, how will uWSGI be capable of handling other clients without blocking?
- Is there any way the processing can be offloaded using built in functionality in uWSGI so that the user get a response after upload and can query for processing status?
Thank you for any help.
It will block. A solution is to put a webserver like
NGINX
in front ofuWSGI
that pre-buffers thePOST
request. So the file upload will be actually bound to an NGINX handler until is completed and then passed to theuWSGI
handler.You need to create a task queue system to offload the processing from the web handler. This is a common best practice. Just look around for
python task queues
. For builtin functionalities it really depends on the task you need to offload. You can use the builtin uWSGI spooler, or the uWSGI mules. These are very good alternatives to a typical task queue (like the very famous Celery) but have limitations. Just try it yourself in your scenario.