When I run my WebSocket test, I found the following interesting memory usage results:
Server stated, no connection
[{total,573263528},
{processes,17375688},
{processes_used,17360240},
{system,555887840},
{atom,472297},
{atom_used,451576},
{binary,28944},
{code,3774097},
{ets,271016}]
44 processes,
System:705M,
Erlang Residence:519M
100K Connections
[{total,762564512},
{processes,130105104},
{processes_used,130089656},
{system,632459408},
{atom,476337},
{atom_used,456484},
{binary,50160},
{code,3925064},
{ets,7589160}]
100044 processes,
System: 1814M,
Erlang Residence: 950M
200K Connections
( restart server and create from 0 connection, not continue from case 2)
[{total,952040232},
{processes,243161192},
{processes_used,243139984},
{system,708879040},
{atom,476337},
{atom_used,456484},
{binary,70856},
{code,3925064},
{ets,14904760}]
200044 processes,
System:3383M,
Erlang: 1837M
The figures with "System:" and "Erlang:" are provided htop, others are output of memory() call from erlang shell. Please look at the total and erlang residence memory. When there is no connection, these two are roughly same, with 100K connections, residence memory is a little larger than total, with 200K connections, residence memory is almost double the total.
Can anybody explain?
The most probable answer for your quersion is memory fragmentation.
Allocating OS memory is expensive, so Erlang tries to manage memory for you. When Erlang allocates memory, it creates an entity called "carrier", which consists of many "blocks". Erlang memory(total) reports the sum of all the block sizes (memory actually used). OS reports the sum of all carriers sizes (sum of memory used and preallocated). Both sum of blocks sizes and carrier sizes can be read from Erlang VM. If (block sizes)/(carrier sizes) << 1, than VM has hard time with freeing the carriers. There might be many big carriers with only couple of blocks used. You can read it with: erlang:system_info({allocator,Type}). but there is an easier way. You can check it using Recon library:
http://ferd.github.io/recon/recon_alloc.html
Firstly check:
and next:
This should explain the difference between total memory reported by Erlang VM and OS. If you are sending many small messages over websockets, you can decrease the fragmentation by running Erlang with 2 options:
First option will change the block allocation strategy from best fit to address order best fit, which could help squeeze more blocks into first carriers and second one decreases maximum multiblock carrier size, which makes carriers smaller (this should make freeing them easier).