I have a standard LAMP EC2 instance set-up running on Amazon's AWS. Having also installed Node.js, socket.io and Express to meet the demands of live updating, I am now at the stage of load balancing the application. That's all working, but my sockets aren't. This is how my set-up looks:-
--- EC2 >> Node.js + socket.io
/
Client >> ELB --
\
--- EC2 >> Node.js + socket.io
[RDS MySQL - EC2 instances communicate to this]
As you can see, each instance has an installation of Node and socket.io. However, occasionally Chrome debug will 400 the socket request returning the reason {"code":1,"message":"Session ID unknown"}
, and I guess this is because it's communicating to the other instance.
Additionally, let's say I am on page A and the socket needs to emit to page B - because of the load balancer these two pages might well be on a different instance (they will both be open at the same time). Using something like Sticky Sessions, to my knowledge, wouldn't work in that scenario because both pages would be restricted to their respective instances.
How can I get around this issue? Will I need a whole dedicated instance just for Node? That seems somewhat overkill...
The solution would be for the two(or more) node.js installs to use a common session source.
Here is a previous question on using REDIS as a common session store for node.js How to share session between NodeJs and PHP using Redis?
and another Node.js Express sessions using connect-redis with Unix Domain Sockets
The issues come up when you consider both websocket traffic (layer 4 -ish) and HTTP traffic (layer 7) moving across a load balancer that can only inspect one layer at a time. For example, if you set the ELB to load balance on layer 7 (HTTP/HTTPS) then websockets will not work at all across the ELB. However, if you set the ELB to load balance on layer 4 (TCP) then any fallback HTTP polling requests could end up at any of the upstream servers.
You have two options here. You can figure out a way to effectively load balance both HTTP and websocket requests or find a way to deterministically map requests to upstream servers regardless of the protocol.
The first one is pretty involved and requires another load balancer. A good walkthrough can be found here. It's worth noting that when that post was written HAProxy didn't have native SSL support. Now that this is the case it might be possible to just remove the ELB entirely, if that's the route you want to go. If that's the case the second option might be better.
Otherwise you can use HAProxy on its own (or a paid version of Nginx) to implement a deterministic load balancing mechanism. In this case you would use IP hashing since socket.io does not provide a route-based mechanism to identify a particular server like sockjs. This would use the first 3 octets of the IP address to determine which upstream server gets each request so unless the user changes IP addresses between HTTP polls then this should work.