This might be the impossible issue. I've tried everything. I feel like there's a guy at a switchboard somewhere, twirling his mustache.
The problem:
I have Amazon EC2 running an application. It functions without issue when there is only one instance and no load balancer.
But in my production environment I have two identical instances running behind one load-balancer and when performing certain tasks, like a feature that generates a PDF and attaches it to an email, nothing happens at all, and when using Google Developer tools with the Network tab I get the error "504 Gateway Timeout" once the timeout hits (I have it set at 30 seconds).
My Database is external, on Amazon RDS.
I think.... If I could force a client to stay connected to their initial server they logged in at, this problem would be solved, because it's my understanding that the 504 Gateway Timeout is happening when instance-1 tries to reach out to instance-2 to perform the task.
This happens ONLY WHEN using Load Balancing, but never when connecting straight to one of my two servers.
Load Balancer Settings:
- The load balancer has a CRECORD on my Registrar so that app.myapplication.com points to myloadbalancerDNSname.elb.amazonaws.com
- The load balancer has 2 healthy instances, each in the same region but they are in different availability zones.
- The load balancer is using the same Security Groups as the Instances (allow ALL IPs on ports 22, 80, and 443)
- The load balancer has cross-zone load balancing turned on.
- CORS (in Amazon S3) is enabled to GET, POST, PUT, DELETE from * to * (I have no idea how this is associated with my instances but anyway I did it as the instructions said)
- The load balancer has listeners configured as such:
- Load Balancer Protocol:HTTP Load Balancer Port:80 Instance Protocol:HTTP Instance Port:80
- Load Balancer Protocol:HTTPS Load Balancer Port:443 Instance Protocol:HTTP Instance Port:80 (cipher chosen correctly per my Cert provider, and SSL fields 100% surely correct)
Some more ideas:
That being said, I'm not testing with HTTPS, but normal HTTP instead. I'm not convinced SSL is setup properly even though my certificate provider said it is. The reason I'm suspicious is that when I try to key in https://app.myapplication.com I get the error "(failed) net::ERR_CONNECTION_CLOSED" in Google Developer Tools, in the Network tab. But this should be non-applicable because I'm having the problem even using regular HTTP. I can troubleshoot SSL later.
So to reiterate, my problem is having the "504 Gateway Timeout" problem when using some functions, but also occasionally at random instead of loading the page (but rarely). This 504 problem happens ONLY WHEN using Load Balancing, but never when connecting straight to one of my two instances.
I don't know which question to ask, because I've Followed every document to the T, double and triple checked all suggestions all over the web and NOTHING.
In my case, it turns out that there was no problem with the load balancer. The final solution ending up being Ubuntu's hosts file in which there was an inexplicable entry to route traffic from some mystery IP to my application's host name. So, during the process of creating the PDF, paths were getting re-written by the PDF generator to point at the mystery server, and hence the Gateway timeout issues. I have no idea why it was occasionally working and not failing.
This is what it looked like, so I removed that third line and all the gears started turning again. :P
We use Amazon EC2 instances behind an Amazon ELB and we were getting 504 GATEWAY_TIMEOUT errors. We use Apache and PHP on Ubuntu web servers.
In our case, the error was due to the servers running out of memory. We didn't see the "out of memory" in our Apache error logs. There was a 504 line entry in the Apache access logs. We confirmed the "out of memory" by looking into the syslog file ( /var/log/syslog ) and fixed the memory issue.
This resolved the 504 error for us.
First, what is the Idle Timeout for your ELB set to? You'll find it at the very bottom of the "Description" tab for your load balancer. You can read more about the idle timeout here in the ELB documentation. The default is 60 seconds. You should also consider setting or increasing Keep-alive in your web server. How you do that will depend on what web server you are using.
Second, if you think it's due to the client being switched from one instance to the other then you should enable session stickiness in the ELB. This will ensure that a client is always directed to the same back-end instance by the load balancer. To enable this, again go to the "Description" tab then click on the Edit link next to each entry in the Port Configuration section. You'll likely want to choose the "Enable Load Balancer Generated Cookie Stickiness" option since that will tell the ELB to manage all aspects of the stickiness.
Most probably idle timeout is the culprit and the default value is 60 seconds. AWS ALB
What web server are you using? I had a very similar issue with nginx and AWS load balancing. I added
keepalive_timeout 75s;
to the http block in my nginx config file and haven't see the issue since.Make sure you restart nginx after you add and save that line (on ubuntu
sudo service nginx restart
. On redhat stop nginx/path/to/nginx/executable -s stop
then/path/to/nginx/executable
to start up nginx)This fix was recommended by AWS on their help page AWS Load balancer troubleshooting
Check security groups settings. The port 80 may be restricted to access.