I am unable to scale my simple Socket.IO app past around 980 concurrent connections using Docker. However, if I run it locally on my macOS Sierra 10.12.6 I can get over 3000 connections. I have included this repo of a simple SocketIO application that I am testing with: https://github.com/gsccheng/simple-socketIO-app
My Docker-for-Mac is configured at 4 CPUs and 5 GB memory. The Version is
Version 17.09.0-ce-mac35 (19611)
Channel: stable
a98b7c1b7c
I am using Artillery version 1.6.0-9
to load test it with
$ artillery run load-test.yaml
I'm showing some redundant configurations of the settings (to show you that they have been considered). Here are my steps to reproduce.
$ docker build . -t socket-test
$ docker run -p 8000:8000 -c 1024 -m 4096M --privileged --ulimit nofile=9000:9000 -it test-socket:latest /bin/sh
#> DEBUG=* npm start
Up to around 980 connections I will get logs like this:
Connected to Socket!
socket.io:client writing packet {"type":2,"data":["news",{"hello":"world"}],"nsp":"/"} +0ms
socket.io-parser encoding packet {"type":2,"data":["news",{"hello":"world"}],"nsp":"/"} +0ms
socket.io-parser encoded {"type":2,"data":["news",{"hello":"world"}],"nsp":"/"} as 2["news",{"hello":"world"}] +0ms
engine:socket sending packet "message" (2["news",{"hello":"world"}]) +0ms
socket.io:socket joined room 0ohCcHMWYASnfRgJAAPS +0ms
engine:ws received "2" +5ms
engine:socket packet +0ms
engine:socket got ping +0ms
engine:socket sending packet "pong" (undefined) +0ms
engine:socket flushing buffer to transport +0ms
engine:ws writing "3" +0ms
engine upgrading existing transport +2ms
engine:socket might upgrade socket transport from "polling" to "websocket" +0ms
engine intercepting request for path "/socket.io/" +2ms
engine handling "GET" http request "/socket.io/?EIO=3&transport=polling&t=Ly8pfqL&b64=1&sid=0ohCcHMWYASnfRgJAAPS" +0ms
engine setting new request for existing client +0ms
engine:polling setting request +0ms
engine:socket flushing buffer to transport +0ms
engine:polling writing "28:42["news",{"hello":"world"}]" +0ms
engine:socket executing batch send callback +1ms
engine:ws received "2probe" +4ms
engine:ws writing "3probe" +0ms
engine intercepting request for path "/socket.io/" +4ms
engine handling "GET" http request "/socket.io/?EIO=3&transport=polling&t=Ly8pfqV&b64=1&sid=0ohCcHMWYASnfRgJAAPS" +0ms
engine setting new request for existing client +0ms
engine:polling setting request +0ms
engine:socket writing a noop packet to polling for fast upgrade +10ms
engine:polling writing "1:6" +0ms
engine:ws received "5" +2ms
engine:socket got upgrade packet - upgrading +0ms
engine:polling closing +0ms
engine:polling transport discarded - closing right away +1ms
engine:ws received "2" +20ms
engine:socket packet +0ms
engine:socket got ping +0ms
engine:socket sending packet "pong" (undefined) +0ms
engine:socket flushing buffer to transport +1ms
engine:ws writing "3" +0ms
engine intercepting request for path "/socket.io/" +1ms
engine handling "GET" http request "/socket.io/?EIO=3&transport=polling&t=Ly8pfr1&b64=1" +0ms
engine handshaking client "6ccAiZwbvrchxZEiAAPT" +0ms
engine:socket sending packet "open" ({"sid":"6ccAiZwbvrchxZEiAAPT","upgrades":["websocket"],"pingInterval":25000,"pingTimeout":60000}) +0ms
engine:socket sending packet "message" (0) +0ms
engine:polling setting request +0ms
engine:socket flushing buffer to transport +0ms
engine:polling writing "97:0{"sid":"6ccAiZwbvrchxZEiAAPT","upgrades":["websocket"],"pingInterval":25000,"pingTimeout":60000}2:40" +0ms
engine:socket executing batch send callback +0ms
socket.io:server incoming connection with id 6ccAiZwbvrchxZEiAAPT +0ms
socket.io:client connecting to namespace / +1ms
socket.io:namespace adding socket to nsp / +0ms
socket.io:socket socket connected - writing packet +0ms
socket.io:socket joining room 6ccAiZwbvrchxZEiAAPT +0ms
socket.io:socket packet already sent in initial handshake +0ms
Connected to Socket!
At about 980 connections I will begin seeing these disconnected events:
disconnected to Socket!
transport close
engine intercepting request for path "/socket.io/" +27ms
engine handling "GET" http request "/socket.io/?EIO=3&transport=polling&t=Ly8pg1T&b64=1" +0ms
engine handshaking client "C-pdSXFCbwQaTeYLAAPh" +0ms
engine:socket sending packet "open" ({"sid":"C-pdSXFCbwQaTeYLAAPh","upgrades":["websocket"],"pingInterval":25000,"pingTimeout":60000}) +0ms
engine:socket sending packet "message" (0) +0ms
engine:polling setting request +0ms
engine:socket flushing buffer to transport +0ms
engine:polling writing "97:0{"sid":"C-pdSXFCbwQaTeYLAAPh","upgrades":["websocket"],"pingInterval":25000,"pingTimeout":60000}2:40" +0ms
engine:socket executing batch send callback +0ms
socket.io:server incoming connection with id C-pdSXFCbwQaTeYLAAPh +0ms
socket.io:client connecting to namespace / +0ms
socket.io:namespace adding socket to nsp / +0ms
socket.io:socket socket connected - writing packet +1ms
socket.io:socket joining room C-pdSXFCbwQaTeYLAAPh +0ms
socket.io:socket packet already sent in initial handshake +0ms
Connected to Socket!
socket.io:client writing packet {"type":2,"data":["news",{"hello":"world"}],"nsp":"/"} +0ms
socket.io-parser encoding packet {"type":2,"data":["news",{"hello":"world"}],"nsp":"/"} +0ms
socket.io-parser encoded {"type":2,"data":["news",{"hello":"world"}],"nsp":"/"} as 2["news",{"hello":"world"}] +0ms
engine:socket sending packet "message" (2["news",{"hello":"world"}]) +0ms
socket.io:socket joined room C-pdSXFCbwQaTeYLAAPh +0ms
engine intercepting request for path "/socket.io/" +13ms
engine handling "POST" http request "/socket.io/?EIO=3&transport=polling&t=Ly8pg1g&b64=1&sid=C-pdSXFCbwQaTeYLAAPh" +0ms
engine setting new request for existing client +1ms
engine:polling received "1:1" +0ms
engine:polling got xhr close packet +0ms
socket.io:client client close with reason transport close +0ms
socket.io:socket closing socket - reason transport close +1ms
disconnected to Socket!
Then it'll be this repeated over and over again:
engine:ws writing "3" +0ms
engine:ws received "2" +42ms
engine:socket packet +0ms
engine:socket got ping +0ms
engine:socket sending packet "pong" (undefined) +1ms
engine:socket flushing buffer to transport +0ms
engine:ws writing "3" +0ms
engine:ws received "2" +4ms
engine:socket packet +0ms
engine:socket got ping +0ms
engine:socket sending packet "pong" (undefined) +0ms
engine:socket flushing buffer to transport +0ms
engine:ws writing "3" +0ms
engine:ws received "2" +45ms
engine:socket packet +0ms
engine:socket got ping +0ms
engine:socket sending packet "pong" (undefined) +0ms
engine:socket flushing buffer to transport +0ms
engine:ws writing "3" +0ms
engine:ws received "2" +7ms
engine:socket packet +0ms
engine:socket got ping +0ms
engine:socket sending packet "pong" (undefined) +0ms
engine:socket flushing buffer to transport +0ms
engine:ws writing "3" +0ms
As you can see in my Dockerfile, I have set a few configurations that I've gathered from googling my problem:
COPY limits.conf /etc/security/
COPY sysctl.conf /etc/
COPY rc.local /etc/
COPY common-session /etc/pam.d/
COPY common-session-noninteractive /etc/pam.d/
COPY supervisord.conf /etc/supervisor/
On my local system I've also done a few configurations like following this example. Here is the state of my host machine:
$ ulimit -a
core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
file size (blocks, -f) unlimited
max locked memory (kbytes, -l) unlimited
max memory size (kbytes, -m) unlimited
open files (-n) 64000
pipe size (512 bytes, -p) 1
stack size (kbytes, -s) 8192
cpu time (seconds, -t) unlimited
max user processes (-u) 2048
virtual memory (kbytes, -v) unlimited
What can I do to get more than ~980 concurrent socket connections? Why do I fail to make any more connections at that point? How can my repo be tweaked (if needed) to get this to work?
Edit
When I lower the nofiles
limit to say 500 for the container, I see that my application disconnects seem to fail the same way. When I increase or decrease my memory and CPU by say half/double. I don't see any different in behavior, so it doesn't seem like that is the issue.
There's a significant difference between the network path to the app locally and the app running in Docker for Mac.
The path to your app on the mac is straight in via the loopback interface:
When using Docker for Mac, the path in includes more hops and includes two userland proxy processes,
vpnkit
on your mac anddocker-proxy
which accept TCP connections on the forwarded port and forward data in:Try with a VM that has a network directly accessible to the mac to see if
vpnkit
is making an appreciable difference.You can also remove
docker-proxy
by attaching the containers interface directly to the the VM network so the container doesn't require the port mapping (-p
). This can be done by mapping a macvlan interface to the container or placing the container on a bridge attached to the VM network. This is a vagrant setup I use for the bridged network.Once you've got rid of the network differences then I'd look at tuning the VM and container in a bit more detail. I'd guess you should see a 10-20% decrease in the VM, not 66%.
I faced the
engine:polling got xhr close packet
And I tried to search all from stackoverflow, but only this question has this info.I have briefly investigated into it, and it is that when client sending both
get
+post
http request, somehow, the load balancer rejected theget
while thepost
may still work fine, so this also happens on our sites.The problem should be escalated to the stability of load balancer. (Especially its stability of sticky session)