NGINX Reverse Proxy Causes 502 Errors On Some Page

2019-08-03 16:44发布

问题:

I have a Node.js/Express application running on an Ubuntu server. It sits behind an NGINX reverse proxy that passes traffic on port 80 (or 443 for ssl) to the application's port.

I've recently had an issue where for no identifiable reason, traffic trying to access / will eventually get a 504 error and timeout. As a test, I increased the timeout and am now getting a 502 error. I can access some other routes on my application, /login for example, with no problems.

When I restart my Express application, my app runs fine with no issues, usually for a few days until this happens again. Viewing the logs for my Express app, a good request looks something like:

GET / 200 15.786 ms - 1214

Whereas requests that aren't responding properly look like this:

GET / - - ms - -

This application has been running properly for about 13 months with no issues, this issue has arisen with no prompting. I haven't pushed any updates within the time that this has occurred.

Here is my NGINX config (modified a bit for security, e.g. example.com)

upstream site_upstream {
    server 127.0.0.1:3000;
}

server {
    listen 80;
    listen 443 ssl;

    server_name example.com;
    ssl_certificate /etc/nginx/ssl/nginx.crt;
    ssl_certificate_key /etc/nginx/ssl/nginx.key;

    location / {
        proxy_pass http://site_upstream;
        proxy_http_version 1.1;
        proxy_set_header Upgrade $http_upgrade;
        proxy_set_header Connection 'upgrade';
        proxy_set_header Host $host;
        proxy_cache_bypass $http_upgrade;
        proxy_redirect http://rpa_upstream https://example.com;
    }
}

I am unsure of if this an issue with my NGINX config or with my application itself as neither of my configurations have changed.

回答1:

It sounds like a memory leak in either nginx or your Node application. If it starts to work again after restarting your Node application but without restarting nginx then it seems it's a problem with your Node app.

Try also accessing your app directly without a proxy to see what problems do you have in that case. You can sometimes get more detailed info that way in your browser's developer tools or with command-line tools like curl or benchmarks like Apache ab. Running heavy benchmarks with ab can help you spot the problems more quickly instead of waiting.

Of course it's hard to say what's exactly the problem when you don't show any code.

If it was working fine before, and if you didn't upgrade anything (your app, any Node modules, or Node itself) during that time, then maybe your traffic increased slightly and now you start seeing the problems that were not manifesting before. Or maybe your system now uses more RAM for other tasks and the memory leak starts to be a problem quicker than before.

You can start logging data returned by process.memoryUsage() on a regular intervals and see if anything looks problematic.

Also monitor your Node processes with ps, top, htop or other commands, or see the memory usage /proc/PID/status etc.

You can also monitor /proc/meminfo on regular intervals and see if the total memory used in your system is correlated with your application getting unresponsive.

Another thing that may be causing problems is for example conenctions to your database responding slowly or not at all, if you are not handling errors and timeouts inside of your application. Adding more extensive logging (a line entering every route handler, a line before every I/O opertation starts and after every I/O operation either succeeds or fails or times out) should give you some more insight into it.