We have an xmpp connection server that connects sockets to GCM XMPP endpoints and starts sending notifications.
One thing We've noticed is upon sending a semi-large notification (say to as little as a 1000 devices), the sockets keep getting suddenly disconnected receiving the following error message:
Client disconnected socket=b913-512-904dc69, code=EPIPE, errno=EPIPE, syscall=write
For example, this is the log of the live server when starting to send a notification to different registration IDS.
- info: Sent downstream message msgId=P#c1uq... socketId=512
- info: Sent downstream message msgId=P#c3tE... socketId=512
- info: Sent downstream message msgId=P#c1TF... socketId=512
- info: Sent downstream message msgId=P#c3sy... socketId=512
- info: Sent downstream message msgId=P#c41N... socketId=512
...
- info: Sent downstream message msgId=P#cJbr... socketId=512
- info: Sent downstream message msgId=P#cJXO... socketId=512
info: Client disconnected socket=b913-512-904dc69, code=EPIPE, errno=EPIPE, syscall=write
This keeps happening all the time and everywhere in our system and is making service QA pretty difficult.
Another thing that we've noticed is that sometimes when calling socket.send(stanza)
, the value false
is returned, even when the socket is definitely connected. This one is even worse since we have to do re-queueing of the messages and it's really resource heavy when sending millions of messages. This will be explained below.
Additional Information:
From the 1st message to the 84th (where disconnection happens), less than a 100 milliseconds have passed.
We have about 52 sockets open for this JID/PASSWORD (senderId,Api_key in GCM's terms), on 3 different servers. All keep disconnecting now and then when a large notification send task comes along (say to 10000 recipients).
- Sockets successfully re-connect, but they're disconnected for several seconds and this reduces efficiency and reliability of our system.
How the connection is setup:
const xmpp = require('node-xmpp-client');
let socket = new xmpp.Client({
port: 5235,
host: 'gcm-xmpp.googleapis.com',
legacySSL: true,
preferredSaslMechanism: 'PLAIN',
reconnect: true,
jid: $JID,
password: $PASSWORD
});
socket.connection.socket.setTimeout(0);
socket.connection.socket.setKeepAlive(true, 10000);
socket.on('stanza', (stanza) => handleStanza(stanza));
...
Acks are sent for every upstream message received.
But one thing we see is that the following returns false sometimes when sending downstream messages, "even when the socket is connected".
// This returns false many times! even when the socket.connection.connected === true!
socket.send(xmppStanza)
If this happens, we queue the ack message to be retried later but keep sending messages to the gcm.
Why does socket.send
return false sometimes? (This obviously is not an error like EPIPE or whatever, it's just a false, meaning flushing the socket was unsuccessful, maybe the socket becomes un-writeable even-though it's connected ?).
If acks are delayed, will GCM close the connection with the delayed acks or will it just stop sending upstreams?
(AFAIK, it'll just stop sending upstreams, so maybe this has nothing to do with the connections being closed (EPIPEs)?)
I'd be really grateful if anyone could shed some light on this behavior.
Thanks !