Deploying Django with gunicorn and nginx

2019-01-20 21:15发布

问题:

This is a broad question but I'd like to get a canonical answer. I have been trying to deploy a site using gunicorn and nginx in Django. After reading tons of tutorials I have been successful but I can't be sure that the steps I followed are good enough to run a site without problems or maybe there are better ways to do it. That uncertainty is annoying.

That's why I'm looking for a very detailed and well explained answer for newbies. I don't want to explain too much what I know and what I don't know since this could skew the answers a bit and other people could benefit to a lesser degree from your answers. However, some things I'd like to see mentioned are:

  • What "setup" have you seen work best? I used virtualenv and moved my Django project inside this environment, however I have seen another setups where there is a folder for virtual environments and other for projects.

  • How can I setup things in a way that allows several sites to be hosted in a single server?

  • Why some people suggest using gunicorn_django -b 0.0.0.0:8000 and others suggest gunicorn_django -b 127.0.0.1:8000? I tested the latter in an Amazon EC2 instance but it didn't work while the former worked without problem.

  • What is the logic behind the config file of nginx? There are so many tutorials using drastically different configuration files that I'm confused on which one is better. For example, some people use alias /path/to/static/folder and others root /path/to/static/folder. Maybe you can share your preferred configuration file.

  • Why do we create a symlink between site-available and sites-enabled in /etc/nginx?

  • Some best practices are as always welcomed :-)

Thanks

回答1:

What "setup" have you seen work best? I used virtualenv and moved my django project inside this environment, however I have seen another setups where there is a folder for virtual environments and other for projects.

virtualenv is a way to isolate Python environments; as such, it doesn't have a large part to play at deployment - however during development and testing it is a requirement if not highly recommended.

The value you would get from virtualenv is that it allows you to make sure that the correct versions of libraries are installed for the application. So it doesn't matter where you stick the virtual envrionment itself. Just make sure you don't include it as part of the source code versioning system.

The file system layout is not critical. You will see lots of articles extolling the virtues of directory layouts and even skeleton projects that you can clone as a starting point. I feel this is more of a personal preference than a hard requirement. Sure its nice to have; but unless you know why, it doesn't add any value to your deployment process - so don't do it because some blog recommends it unless it makes sense for your scenario. For example - no need to create a setup.py file if you don't have a private PyPi server that is part of your deployment workflow.

How can I setup things in a way that allows several sites to be hosted in a single server?

There are two things you need to do multiple site setups:

  1. A server that is listening on the public IP on port 80 and/or port 443 if you have SSL.
  2. A bunch of "processes" that are running the actual django source code.

People use nginx for #1 because its a very fast proxy and it doesn't come with the overhead of a comprehensive server like Apache. You are free to use Apache if you are comfortable with it. There is no requirement that "for mulitple sites, use nginx"; you just need a service that is listening on that port, knows how to redirect (proxy) to your processes running the actual django code.

For #2 there are a few ways to start these processes. gevent/uwsgi are the most popular ones. The only thing to remember here is do not use runserver in production.

Those are the absolute minimum requirements. Typically people add some sort of process manager to control all the "django servers" (#2) running. Here you'll see upstart and supervisor mentioned. I prefer supervisor as it doesn't need to take over the entire system (unlike upstart). However, again - this is not a hard requirement. You could perfectly run a bunch of screen sessions and detatch them. The downside is, should your server restart, you would have to relaunch the screen sessions.

Personally I would recommend:

  1. Nginx for #1
  2. Take your pick between uwsgi and gunicorn - I use uwsgi.
  3. supervisor for managing the backend processes.
  4. Individual system accounts (users) for each application you are hosting.

The reason I recommend #4 is to isolate permissions; again, not a requirement.

Why some people suggest using gunicorn_django -b 0.0.0.0:8000 and others suggest gunicorn_django -b 127.0.0.1:8000? I tested the latter in an Amazon EC2 instance but it didn't work while the former worked without problem.

0.0.0.0 means "all IP addresses" - its a meta address (that is, a placeholder address). 127.0.0.1 is a reserved address that always points to the local machine. That is why its called "localhost". It is only reachable to processes running on the same system.

Typically you have the front end server (#1 in the list above) listening on the public IP address. You should explicitly bind the server to one IP address.

However, if for some reason you are on DHCP or you don't know what the IP address will be (for example, its a newly provisioned system), you can tell nginx/apache/any other process to bind to 0.0.0.0. This should be a temporary stop-gap measure.

For production servers you'll have a static IP. If you have a dynamic IP (DHCP), then you can leave in 0.0.0.0. It is very rare that you'll have DHCP for your production machines though.

Binding gunicorn/uwsgi to this address is not recommended in production. If you bind your backend process (gunicorn/uwsgi) to 0.0.0.0, it may become accessible "directly", bypassing your front-end proxy (nginx/apache/etc); someone could just request http://your.public.ip.address:9000/ and access your application directly especially if your front-end server (nginx) and your back end process (django/uwsgi/gevent) are running on the same machine.

You are free to do it if you don't want to have the hassle of running a front-end proxy server though.

What is the logic behind the config file of nginx? There are so many tutorials using drastically different configuration files that I'm confused on which one is better. For example, some people use "alias /path/to/static/folder" and others "root /path/to/static/folder". Maybe you can share your preferred configuration file.

First thing you should know about nginx is that it is not a webserver like Apache or IIS. It is a proxy. So you'll see different terms like 'upstream'/'downstream' and multiple "servers" being defined. Take some time and go through the nginx manual first.

There are lots of different ways to set up nginx; but here is one answer to your question on alias vs. root. root is an explicit directive that binds the document root (the "home directory") of nginx. This is the directory it will look at when you give a request without a path like http://www.example.com/

alias means "map a name to a directory". Aliased directories may not be a sub directory of the document root.

Why do we create a symlink between site-available and sites-enabled in /etc/nginx?

This is something unique to debian (and debian-like systems like ubuntu). sites-available lists configuration files for all the virtual hosts/sites on the system. A symlink from sites-enabled to sites-available "activates" that site or virtual host. It is a way to separate configuration files and easily enable/disable hosts.



回答2:

I am not a deployment guru but will share some of my practices for deploying Django with gevent (should be similar to gunicorn though).

virtualenv is great for reasons I will not go into. I however found virtualenv-wrapper (docs) very useful, especially when you are working on many projects since it allows to easy switch between the different virtualenvs. This does not really apply to the deployment environment however when I do need to troubleshoot on the server using SSH, I found this very useful. Another advantage of using it is that it manages the virtualenv directory, so less manual work for you. Virtualenvs are meant to be disposable so that in case you have version issues, or any other install issues, you can just dump the env and create a new one. As the result, it is the best practice not to include any of your project code within the virtualenv. It should be kept separate.

As for setting up multiple sites, virtualenv is pretty much the answer. You should have a separate virutalenv for each project. Just that alone can solve many issues. Then when you deploy, a different Python process will run different sites which avoids any possible conflicts between the deployments. One tool I particularly found very useful in managing multiple sites on the same server is supervisor (docs). It provides an easy interface for starting, stopping and restarting different Django instances. It is also capable of auto-restarting a process when it fails or when the computer starts-up. So for example, if some exception is raised and nothing catches it, the whole web site can go down. Supervisor will catch that and will restart the Django instance automatically. The following is a sample supervisor program (a single process) config:

[program:foo]
command=/path/toviertualenv/bin/python deploy.py
directory=/path/where/deploy.py/is/located/
autostart=true
autorestart=true
redirect_stderr=True
user=www

For Nginx, I know it can be overwhelming at first. I found Nginx book very useful. It explains all the major nginx directives.

In my nginx install, I found the best practice is to setup only the core configs in the nginx.conf file and then I have a separate folder sites where I keep the nginx configs for each of the sites I host. Then I just include all the files from that folder in the core config file. I use the directive include sites/+*.conf;. This way it only includes the files starting with + symbol within the sites folder. That way just by the filename I can control which config files get to be loaded. So if I wish to disable a certain site, I just have to rename the config file and restart nginx. Not really sure what you meant by "symlink between site-available and sites-enabled in /etc/nginx" in your question since those are Apache named folders but they accomplish similar task as the include directive.

As for root and alias directives, they are pretty much the same except where their root is calculated. In alias, whatever in the location in dropped, whereas in root in it not. Image that you have the following nginx config:

location /static {
    alias /some/path/;
}
location /static2 {
    root /some/other/path/;
}

If the user goes to these URLs, then nginx will try to look for the files in the following places on the system:

/static/hello/world.pdf => /some/path/hello/world.pdf
/static2/hello/world.pdf => /some/other/path/static2/hello/world.pdf

This is a simple config for nginx site:

server {
    server_name .foodomain.com;
    listen 80;

    access_log logs/foodomain.log;

    gzip                on;
    gzip_http_version   1.0;
    gzip_comp_level     2;
    gzip_proxied        any;
    gzip_min_length     1100;
    gzip_buffers        16 8k;
    gzip_types          text/plain text/html text/css application/x-javascript text/xml application/xml application/xml+rss text/javascript;

    # Some version of IE 6 don't handle compression well on some mime-types, so just disable for them
    gzip_disable "MSIE [1-6].(?!.*SV1)";

    # Set a vary header so downstream proxies don't send cached gzipped content to IE6
    gzip_vary on;

    location / {
        proxy_read_timeout      30s;
        proxy_pass              http://localhost:8000;
        proxy_set_header        Host                 $host;
        proxy_set_header        User-Agent           $http_user_agent;
        proxy_set_header        X-Real-IP            $remote_addr;
    }

    location /media {
        alias   /path/to/media/;
        expires 1y;
    }

    location /static {
        autoindex on;
        expires   1y;
        alias     /path/to/static/;
    }

     location /favicon.ico {
        alias /path/to/favicon.ico;
    }
}

Hopefully this helps you a bit.



回答3:

Well, as far as best practices are concerned which you have asked in your question, I can't help sharing a tool that worked wonders for me, literally! I myself used to get confused in several config files of gunicorn, nginx, supervisorD for several sites! But I was craving to somehow automate the whole process so that I could make changes to my app/site and deploy it instantly. Its name is django-fagungis. You can find details of my experience with the Django Deployment automation here. I just configured a fabfile.py ONCE (django-fagungis uses fabric to automate the whole process and makes a virtualenv in your remote server which is VERY handy to manage dependencies of several sites hosted on a single server. It uses nginx, gunicorn and supervisorD to handle the Django project/site deployment) and django-fagungis clones my latest project from bitbucket (which I use for subversioning) and deploys it on my remote server and I just have to enter three commands on shell of my local machine and that it!! For me, this has turned out to be best and hassle free practice for Django deployment.



回答4:

Check this for minimum gunicorn and nginx configuration required for a Django project. http://agiliq.com/blog/2013/08/minimal-nginx-and-gunicorn-configuration-for-djang/