Cannot get more than ~980 connections with Socket.

2019-07-21 01:54发布

I am unable to scale my simple Socket.IO app past around 980 concurrent connections using Docker. However, if I run it locally on my macOS Sierra 10.12.6 I can get over 3000 connections. I have included this repo of a simple SocketIO application that I am testing with: https://github.com/gsccheng/simple-socketIO-app

My Docker-for-Mac is configured at 4 CPUs and 5 GB memory. The Version is

Version 17.09.0-ce-mac35 (19611)
Channel: stable
a98b7c1b7c

I am using Artillery version 1.6.0-9 to load test it with

$ artillery run load-test.yaml

I'm showing some redundant configurations of the settings (to show you that they have been considered). Here are my steps to reproduce.

$ docker build . -t socket-test
$ docker run -p 8000:8000 -c 1024 -m 4096M --privileged --ulimit nofile=9000:9000 -it test-socket:latest /bin/sh
#> DEBUG=* npm start

Up to around 980 connections I will get logs like this:

Connected to Socket!
  socket.io:client writing packet {"type":2,"data":["news",{"hello":"world"}],"nsp":"/"} +0ms
  socket.io-parser encoding packet {"type":2,"data":["news",{"hello":"world"}],"nsp":"/"} +0ms
  socket.io-parser encoded {"type":2,"data":["news",{"hello":"world"}],"nsp":"/"} as 2["news",{"hello":"world"}] +0ms
  engine:socket sending packet "message" (2["news",{"hello":"world"}]) +0ms
  socket.io:socket joined room 0ohCcHMWYASnfRgJAAPS +0ms
  engine:ws received "2" +5ms
  engine:socket packet +0ms
  engine:socket got ping +0ms
  engine:socket sending packet "pong" (undefined) +0ms
  engine:socket flushing buffer to transport +0ms
  engine:ws writing "3" +0ms
  engine upgrading existing transport +2ms
  engine:socket might upgrade socket transport from "polling" to "websocket" +0ms
  engine intercepting request for path "/socket.io/" +2ms
  engine handling "GET" http request "/socket.io/?EIO=3&transport=polling&t=Ly8pfqL&b64=1&sid=0ohCcHMWYASnfRgJAAPS" +0ms
  engine setting new request for existing client +0ms
  engine:polling setting request +0ms
  engine:socket flushing buffer to transport +0ms
  engine:polling writing "28:42["news",{"hello":"world"}]" +0ms
  engine:socket executing batch send callback +1ms
  engine:ws received "2probe" +4ms
  engine:ws writing "3probe" +0ms
  engine intercepting request for path "/socket.io/" +4ms
  engine handling "GET" http request "/socket.io/?EIO=3&transport=polling&t=Ly8pfqV&b64=1&sid=0ohCcHMWYASnfRgJAAPS" +0ms
  engine setting new request for existing client +0ms
  engine:polling setting request +0ms
  engine:socket writing a noop packet to polling for fast upgrade +10ms
  engine:polling writing "1:6" +0ms
  engine:ws received "5" +2ms
  engine:socket got upgrade packet - upgrading +0ms
  engine:polling closing +0ms
  engine:polling transport discarded - closing right away +1ms
  engine:ws received "2" +20ms
  engine:socket packet +0ms
  engine:socket got ping +0ms
  engine:socket sending packet "pong" (undefined) +0ms
  engine:socket flushing buffer to transport +1ms
  engine:ws writing "3" +0ms
  engine intercepting request for path "/socket.io/" +1ms
  engine handling "GET" http request "/socket.io/?EIO=3&transport=polling&t=Ly8pfr1&b64=1" +0ms
  engine handshaking client "6ccAiZwbvrchxZEiAAPT" +0ms
  engine:socket sending packet "open" ({"sid":"6ccAiZwbvrchxZEiAAPT","upgrades":["websocket"],"pingInterval":25000,"pingTimeout":60000}) +0ms
  engine:socket sending packet "message" (0) +0ms
  engine:polling setting request +0ms
  engine:socket flushing buffer to transport +0ms
  engine:polling writing "97:0{"sid":"6ccAiZwbvrchxZEiAAPT","upgrades":["websocket"],"pingInterval":25000,"pingTimeout":60000}2:40" +0ms
  engine:socket executing batch send callback +0ms
  socket.io:server incoming connection with id 6ccAiZwbvrchxZEiAAPT +0ms
  socket.io:client connecting to namespace / +1ms
  socket.io:namespace adding socket to nsp / +0ms
  socket.io:socket socket connected - writing packet +0ms
  socket.io:socket joining room 6ccAiZwbvrchxZEiAAPT +0ms
  socket.io:socket packet already sent in initial handshake +0ms
Connected to Socket!

At about 980 connections I will begin seeing these disconnected events:

disconnected to Socket!
transport close
  engine intercepting request for path "/socket.io/" +27ms
  engine handling "GET" http request "/socket.io/?EIO=3&transport=polling&t=Ly8pg1T&b64=1" +0ms
  engine handshaking client "C-pdSXFCbwQaTeYLAAPh" +0ms
  engine:socket sending packet "open" ({"sid":"C-pdSXFCbwQaTeYLAAPh","upgrades":["websocket"],"pingInterval":25000,"pingTimeout":60000}) +0ms
  engine:socket sending packet "message" (0) +0ms
  engine:polling setting request +0ms
  engine:socket flushing buffer to transport +0ms
  engine:polling writing "97:0{"sid":"C-pdSXFCbwQaTeYLAAPh","upgrades":["websocket"],"pingInterval":25000,"pingTimeout":60000}2:40" +0ms
  engine:socket executing batch send callback +0ms
  socket.io:server incoming connection with id C-pdSXFCbwQaTeYLAAPh +0ms
  socket.io:client connecting to namespace / +0ms
  socket.io:namespace adding socket to nsp / +0ms
  socket.io:socket socket connected - writing packet +1ms
  socket.io:socket joining room C-pdSXFCbwQaTeYLAAPh +0ms
  socket.io:socket packet already sent in initial handshake +0ms
Connected to Socket!
  socket.io:client writing packet {"type":2,"data":["news",{"hello":"world"}],"nsp":"/"} +0ms
  socket.io-parser encoding packet {"type":2,"data":["news",{"hello":"world"}],"nsp":"/"} +0ms
  socket.io-parser encoded {"type":2,"data":["news",{"hello":"world"}],"nsp":"/"} as 2["news",{"hello":"world"}] +0ms
  engine:socket sending packet "message" (2["news",{"hello":"world"}]) +0ms
  socket.io:socket joined room C-pdSXFCbwQaTeYLAAPh +0ms
  engine intercepting request for path "/socket.io/" +13ms
  engine handling "POST" http request "/socket.io/?EIO=3&transport=polling&t=Ly8pg1g&b64=1&sid=C-pdSXFCbwQaTeYLAAPh" +0ms
  engine setting new request for existing client +1ms
  engine:polling received "1:1" +0ms
  engine:polling got xhr close packet +0ms
  socket.io:client client close with reason transport close +0ms
  socket.io:socket closing socket - reason transport close +1ms
disconnected to Socket!

Then it'll be this repeated over and over again:

 engine:ws writing "3" +0ms
  engine:ws received "2" +42ms
  engine:socket packet +0ms
  engine:socket got ping +0ms
  engine:socket sending packet "pong" (undefined) +1ms
  engine:socket flushing buffer to transport +0ms
  engine:ws writing "3" +0ms
  engine:ws received "2" +4ms
  engine:socket packet +0ms
  engine:socket got ping +0ms
  engine:socket sending packet "pong" (undefined) +0ms
  engine:socket flushing buffer to transport +0ms
  engine:ws writing "3" +0ms
  engine:ws received "2" +45ms
  engine:socket packet +0ms
  engine:socket got ping +0ms
  engine:socket sending packet "pong" (undefined) +0ms
  engine:socket flushing buffer to transport +0ms
  engine:ws writing "3" +0ms
  engine:ws received "2" +7ms
  engine:socket packet +0ms
  engine:socket got ping +0ms
  engine:socket sending packet "pong" (undefined) +0ms
  engine:socket flushing buffer to transport +0ms
  engine:ws writing "3" +0ms

As you can see in my Dockerfile, I have set a few configurations that I've gathered from googling my problem:

COPY limits.conf /etc/security/
COPY sysctl.conf /etc/
COPY rc.local /etc/
COPY common-session /etc/pam.d/
COPY common-session-noninteractive /etc/pam.d/
COPY supervisord.conf /etc/supervisor/

On my local system I've also done a few configurations like following this example. Here is the state of my host machine:

$ ulimit -a
core file size          (blocks, -c) 0
data seg size           (kbytes, -d) unlimited
file size               (blocks, -f) unlimited
max locked memory       (kbytes, -l) unlimited
max memory size         (kbytes, -m) unlimited
open files                      (-n) 64000
pipe size            (512 bytes, -p) 1
stack size              (kbytes, -s) 8192
cpu time               (seconds, -t) unlimited
max user processes              (-u) 2048
virtual memory          (kbytes, -v) unlimited

What can I do to get more than ~980 concurrent socket connections? Why do I fail to make any more connections at that point? How can my repo be tweaked (if needed) to get this to work?

Edit


When I lower the nofiles limit to say 500 for the container, I see that my application disconnects seem to fail the same way. When I increase or decrease my memory and CPU by say half/double. I don't see any different in behavior, so it doesn't seem like that is the issue.

2条回答
该账号已被封号
2楼-- · 2019-07-21 02:39

There's a significant difference between the network path to the app locally and the app running in Docker for Mac.

The path to your app on the mac is straight in via the loopback interface:

          mac  
client -> lo -> nodejs

When using Docker for Mac, the path in includes more hops and includes two userland proxy processes, vpnkit on your mac and docker-proxy which accept TCP connections on the forwarded port and forward data in:

      mac               |                 vm                   |  container    
client -> lo -> vpnkit -> if -> docker-proxy -> NAT -> bridge -> if -> nodejs

Try with a VM that has a network directly accessible to the mac to see if vpnkit is making an appreciable difference.

  mac         |                vm                    |  container
client -> if -> if -> docker-proxy -> NAT -> bridge -> if -> nodejs

You can also remove docker-proxy by attaching the containers interface directly to the the VM network so the container doesn't require the port mapping (-p). This can be done by mapping a macvlan interface to the container or placing the container on a bridge attached to the VM network. This is a vagrant setup I use for the bridged network.

  mac         |  container   <- there is a little vm here, but minimal. 
client -> if -> if -> nodejs

  mac         |      vm       |  container
client -> if -> if -> bridge -> if -> nodejs

Once you've got rid of the network differences then I'd look at tuning the VM and container in a bit more detail. I'd guess you should see a 10-20% decrease in the VM, not 66%.

查看更多
放荡不羁爱自由
3楼-- · 2019-07-21 02:39

I faced the engine:polling got xhr close packet And I tried to search all from stackoverflow, but only this question has this info.

I have briefly investigated into it, and it is that when client sending both get+post http request, somehow, the load balancer rejected the get while the post may still work fine, so this also happens on our sites.

The problem should be escalated to the stability of load balancer. (Especially its stability of sticky session)

查看更多
登录 后发表回答