We have a JEE app that uses about 40 partitioned jobs on a cluster. It can be deployed on both JBoss and WebSphere. We are experiencing 2 problems:
messaging system failures in both JBoss and WebSphere typically related to temporary queue connection problems
partitioned jobs effectively hung because of lost messages.
I read a posting that switching the reply-destination
of the outbound-gateway
can improve robustness and allow for re-connection in the case of failures. The inbound-gateway
basically starts 2 listeners on the requestQueue.
<int-jms:inbound-gateway id="springbatch.inbound.gateway"
connection-factory="springbatch.jmsConnectionFactory"
request-channel="springbatch.slave.jms.request"
request-destination="requestsQueue"
reply-channel="springbatch.slave.jms.response"
concurrent-consumers="2"
max-concurrent-consumers="2"/>
Each job has a separate outbound-channel
.
<int-jms:outbound-gateway
connection-factory="springbatch.jmsConnectionFactory"
request-channel="jms.channel.1"
request-destination="requestsQueue"
reply-channel="jms.channel.2"
reply-destination="repliesQueue"
correlation-key="JMSCorrelationID" >
<int-jms:reply-listener />
</int-jms:outbound-gateway>
It runs fine on a single server but when run on a cluster the partitions run around the cluster but the master step does not get acknowledgement. I thought the JMSCoordinationID
as the correlation-key would handle matching up the JMS messages.
Am I missing a configuration piece?