Large file downloads in cherrypy

2020-07-23 08:51发布

I'm hosting a file access type website using Cherrypy, through uwsgi and nginx on a Raspberry Pi. One thing I've noticed is that if the file is rather large (let's say, about a gigabyte), uwsgi says it was killed by signal 9. This was remedied by putting a cherrypy.config.update({'tools.sessions.timeout': 1000000}) but this doesn't really solve the problem, as much as it is a bad hacky workaround that doesn't really work. It mainly just causes another problem by making the timeout very large. In addition, the browser cannot estimate how long it will take very accurately, and will end up hanging for a while (Read: 5 or so mins on a hardwired connection), and then rapidly starts downloading.

It starts as

The file download

Then goes to

The continued file download

My download code is very simple, just consisting of this single line.

return cherrypy.lib.static.serve_file(path,"application/x-download",os.path.basename(path))

My previous download code didn't quite work out well.

f = file(path) cherrypy.response.headers['Content-Type'] = getType(path)[0] return f Is there a way to remedy this?

1条回答
不美不萌又怎样
2楼-- · 2020-07-23 09:37

General consideration

First, of all I have to say it's such a piled up configuration, CherryPy -> uWSGI -> Nginx, for such a constrained environment. According to the author, it's safe to use CherryPy on its own for small-size applications, when there's no special requirement. Adding Nginx in front adds a lot of flexibility, so it's usually beneficial, but as long as CherryPy's default deployment is standard HTTP, I strongly suggest to stay with the two (and forget about WSGI altogether).

Second, you probably already know that your problem is likely session-related, considering the workaround you've tried. Here's the quote from documentation about streaming response body which file download is.

In general, it is safer and easier to not stream output. Therefore, streaming output is off by default. Streaming output and also using sessions requires a good understanding of how session locks work.

What it suggests is manual session lock management. Knowing how your application works should lead you to appropriate lock design.

And third. There's usually a way to shift the duty of handling a file download to a web-server, basically by sending appropriate header with filename from the proxied application. In case on nginx it's called X-accel. So you can avoid the hassle of lock management, still having session restricted downloads.

Experiment

I've made a simple CherrPy app with two download options and putted it behind Nginx. I played with 1.3GiB video file on local Linux machine in Firefox and Chromium. There were three ways:

  1. Un-proxied download from CherryPy (http://127.0.0.1:8080/native/video.mp4),
  2. Proxied download from CherryPy via Nginx (http://test/native/video.mp4),
  3. X-accel download from CherryPy via Nginx (http://test/nginx/video.mp4).

With (1) and (2) I had minor strange behaviour in both Firefox and Chromium. (1) on Firefox with uptime of several days I constantly had ~5MiB/s download speed and one full-loaded CPU core. On fresh Firefox there was no such behaviour. (2) on Chromium resulted in a couple of unfinished interrupted downloads (all times around 1GiB). But in general both browsers showed around HDD physical performance of 50-70MiB/s.

With (3) I had no issue in both, same 50-70MiB/s throughput, so somehow in my small experiment it ended up as the most stable way.

Setup

app.py

#!/usr/bin/env python
# -*- coding: utf-8 -*-


import os

import cherrypy


DownloadPath = '/home/user/Videos'

config = {
  'global' : {
    'server.socket_host' : '127.0.0.1',
    'server.socket_port' : 8080,
    'server.thread_pool' : 8
  }
}


class App:

  @cherrypy.expose
  def index(self):
    return 'Download test'

  @cherrypy.expose
  def native(self, name):
    basename = os.path.basename(name)
    filename = os.path.join(DownloadPath, basename)
    mime     = 'application/octet-stream'
    return cherrypy.lib.static.serve_file(filename, mime, basename)

  @cherrypy.expose
  def nginx(self, name):
    basename = os.path.basename(name)
    cherrypy.response.headers.update({
      'X-Accel-Redirect'    : '/download/{0}'.format(basename),
      'Content-Disposition' : 'attachment; filename={0}'.format(basename),
      'Content-Type'        : 'application/octet-stream'
    })


if __name__ == '__main__':
  cherrypy.quickstart(App(), '/', config)

app.conf

server {
  listen  80;

  server_name test;

  root /var/www/test/public;

  location /resource {
    # static files like images, css, js, etc.
    access_log off;
  }

  location / {
    proxy_pass         http://127.0.0.1:8080;
    proxy_set_header   Host             $host;
    proxy_set_header   X-Real-IP        $remote_addr;
    proxy_set_header   X-Forwarded-For  $proxy_add_x_forwarded_for;
  }

  location /download {
    internal;
    alias /home/user/Videos;
  }

}
查看更多
登录 后发表回答