How to scale ejabberd Server machine on CentOS to

I am working on a considerably good ejabberd instance with 40 core CPU machine and 160 GB RAM.

The issue is I am unable to scale up to 200 K parallel connections.

The sysctl config is as follows:

net.ipv4.tcp_window_scaling = 1
net.core.rmem_max = 16777216
net.ipv4.tcp_rmem = 4096 87380 16777216
net.ipv4.tcp_wmem = 4096 16384 16777216
#http://linux-ip.net/html/ether-arp.html#ether-arp-flux
net.ipv4.conf.all.arp_filter = 1
kernel.exec-shield=1
kernel.randomize_va_space=1
net.ipv4.conf.all.rp_filter=1
net.ipv4.conf.all.accept_source_route=0
net.ipv4.icmp_echo_ignore_broadcasts=1
net.ipv4.conf.all.log_martians = 1
net.ipv4.conf.all.accept_redirects = 0
net.ipv4.conf.all.send_redirects = 0
net.ipv4.conf.default.send_redirects = 0
net.ipv4.conf.all.secure_redirects = 0
net.ipv4.conf.default.accept_redirects = 0
net.ipv4.conf.default.secure_redirects = 0
net.ipv4.ip_local_port_range = 12000    65535
fs.nr_open = 20000500


fs.file-max = 1000000
net.ipv4.tcp_max_syn_backlog = 10240
net.ipv4.tcp_max_tw_buckets = 400000
net.ipv4.tcp_max_orphans = 60000

net.ipv4.tcp_synack_retries = 3

net.core.somaxconn = 10000

The /etc/security/limits.conf file entries is as follows:

*               soft    core            900000
*               hard    rss             900000
*               soft    nofile          900000
*               hard    nofile          900000
*               soft    nproc           900000
*               hard    nproc           900000

The machine starts to lose connections when the server reaches around 112 K.

Things that happen around 112 K

The CPU usage goes up to 200 ~ 300 % (but it is the usual spike)

Background - When all things are normal the CPU usage shoots up to 80 % as seen below (only two CPUs are doing actual work)

I am unable to work on the machine. I am using top and ss command to see what is going on the server. The machine just stops responding at this point and the connections begin to drop.

What is a saving grace is that the connections don't drop abruptly, but drop at the rate they are connected.

I am using TSUNG to generate the load. There are 4 load generator boxes hitting 4 different ips mapped to only one machine internally.

Any suggestions, opinions are very welcome.

As the first call you would need to establish what's the bottleneck in your case:

CPU
Memory
System limits (open sockets, open files)
Application architecture

If possible add a resource-tracking application to your node, e.g. recon. It will allow you to check the length of process queues, memory fragmentation, etc. In our production system the amount of memory consumed by Erlang VM was different when reported by the system than when reported by the Erlang VM itself due to Transparent Huge Pages (the system was virtualized). There may be other issues that may not be obvious when inspecting the node using system tools.

So I would propose:

Determine processes with the longest queue sizes - they will be responsible for slowing down the system because Erlang VM needs to scan the whole inbox of a process when it receives a message
Determine processes with the biggest amount of allocated memory
Determine how much memory Erlang itself thinks is allocated

Also, it would be good if you added parameters used to start the Erlang VM.

Addition

Forgot to mention that it may be worth looking at the tuning WhatsApp did to their Erlang nodes to handle hundreds of thousands of simultaneous connections:

The WhatsApp Architecture Facebook Bought For $19 Billion