I had a terrible night trying to figure out what is going on with RabbitMQ and SpringXD, unfortunately without a success.
The problem: SpringXD closes RabbitMQ connections repeatedly, or reports warnings related to the channel cache size.
Fragment from the SpringXD log (during stream initialization/autowiring):
2016-05-03T07:42:43+0200 1.3.0.RELEASE WARN
DeploymentsPathChildrenCache-0 listener.SimpleMessageListenerContainer
- CachingConnectionFactory's channelCacheSize can not be less than the
number of concurrentConsumers so it was reset to match: 4
...
2016-05-03T07:54:17+0200 1.3.0.RELEASE ERROR AMQP Connection
192.168.120.125:5672 connection.CachingConnectionFactory - Channel shutdown: connection error
2016-05-03T17:38:58+0200 1.3.0.RELEASE ERROR AMQP Connection
192.168.120.125:5672 connection.CachingConnectionFactory - Channel shutdown: connection error; protocol method:
method<connection.close>(reply-code=504, reply-text=CHANNEL_ERROR -
second 'channel.open' seen, class-id=20, method-id=10)
Fragment from the RabbitMQ log:
=WARNING REPORT==== 3-May-2016::08:08:09 === closing AMQP connection <0.22276.61> (192.168.120.125:59350 -> 192.168.120.125:5672): client
unexpectedly closed TCP connection
=ERROR REPORT==== 3-May-2016::08:08:11 === closing AMQP connection 0.15409.61> (192.168.120.125:58527 -> 192.168.120.125:5672):
{writer,send_failed,{error,closed}}
state blocked error is rare
=ERROR REPORT==== 3-May-2016::17:38:58 === Error on AMQP connection <0.20542.25> (192.168.120.125:59421 -> 192.168.120.125:5672, vhost:
'/', user: 'xd', state: blocked), channel 7: operation channel.open
caused a connection exception channel_error: "second 'channel.open'
seen"
My setup (6 nodes)
- springxd 1.3.0 distributed (zookeeper)
- RabbitMQ 3.6.0, Erlang R16B03-1 cluster
ackMode: AUTO ## or NONE
autoBindDLQ: false
backOffInitialInterval: 1000
backOffMaxInterval: 10000
backOffMultiplier: 2.0
batchBufferLimit: 10000
batchingEnabled: false
batchSize: 200
batchTimeout: 5000
compress: false
concurrency: 4
deliveryMode: NON_PERSISTENT ## or PERSISTENT
durableSubscription: false
maxAttempts: 10
maxConcurrency: 10
prefix: xdbus.
prefetch: 1000
replyHeaderPatterns: STANDARD_REPLY_HEADERS,*
republishToDLQ: false
requestHeaderPatterns: STANDARD_REQUEST_HEADERS,*
requeue: true
transacted: false
txSize: 1000
spring: rabbitmq:
addresses:
priv1:5672,priv2:5672,priv3:5672,
priv4:5672,priv5:5672,priv6:5672
adminAddresses:
http://priv1:15672, http://priv2:15672, http://priv3:15672, http://priv4:15672, http://priv5:15672,http://priv6:15672
nodes:
rabbit@priv1,rabbit@priv2,rabbit@priv3,
rabbit@priv4,rabbit@priv5,rabbit@priv6
username: xd
password: xxxx
virtual_host: /
useSSL: false
ha-xdbus policy:
- ^xdbus\. all
- ha-mode: exactly
- ha-params: 2
- queue-master-locator: min-masters
Rabbit conf
[
{rabbit,
[
{tcp_listeners, [5672]},
{queue_master_locator, "min-masters"}
]
}
].
When ackMode is NONE the following happens:
Eventually the number of consumers drop to zero and I have a zombie streams that don't recover from that state, which in turn causes unwanted queueing.
When ackMode is AUTO the following happens:
Some messages left un-acked forever.
SpringXD streams and durable queues
Rabbit module is being used as source or sink, no custom autowiring.
Typical stream definitions are as follows:
Ingestion:
event_generator | rabbit --mappedRequestHeaders=XDRoutingKey --routingKey='headers[''XDRoutingKey'']'
Processing/Sink:
rabbit --queues='xdbus.INQUEUE-A' | ENRICHMENT-PROCESSOR-A | elastic-sink
rabbit --queues='xdbus.INQUEUE-B' | ENRICHMENT-PROCESSOR-B | elastic-sink
xdbus.INQUEUE-xxx are manually created from the Rabbit admin GUI. (durable)
GLOBAL statistics (from the RabbitMQ Admin)
- Connections: 190
- Channels: 2263 (Channel cache problem perhaps ?)
- Exchanges: 20
- Queues: 120
- Consumers : 1850
Finally:
I would appreciate if someone could answer what is wrong with the configuration (I am pretty sure the network is performing well, so there are no network problems and there is no problem related to max open files limitation).
Message rates vary from 2K/sec to max 30k/sec which is relative small load.
Thanks!
Ivan