Infinispan Jgroups Crashes after war deploying

I'm working on Wildfly 9 with Infinispan 7.2.3.

I'm facing up to a strange problem related to distributed cache:

On the application server i have N deployed wars exposing REST services
Each service code has the common duty to check if a CacheManager si already present on JNDI, if yes, it uses it otherwise i creates a new one and the bind it to the JNDI. So every war works with a unique CacheManager instance.
The Infinispan CacheManager is configured in distributed mode.

The infinispan and jgroups are provided from the application server. After a re-deploy operation (undploy and deploy) of all the wars if i suddenly start to send REST request to these services i get this error:

18:23:42,366 WARN  [org.infinispan.topology.ClusterTopologyManagerImpl] (transport-thread--p2-t12) ISPN000197: Error updating cluster member list: org.infinispan.util.concurrent.Timeout
Exception: Replication timeout for ws-7-aor-58034
    at org.infinispan.remoting.transport.AbstractTransport.parseResponseAndAddToResponseList(AbstractTransport.java:87)
    at org.infinispan.remoting.transport.jgroups.JGroupsTransport.invokeRemotely(JGroupsTransport.java:586)
    at org.infinispan.topology.ClusterTopologyManagerImpl.confirmMembersAvailable(ClusterTopologyManagerImpl.java:402)
    at org.infinispan.topology.ClusterTopologyManagerImpl.updateCacheMembers(ClusterTopologyManagerImpl.java:393)
    at org.infinispan.topology.ClusterTopologyManagerImpl.handleClusterView(ClusterTopologyManagerImpl.java:309)
    at org.infinispan.topology.ClusterTopologyManagerImpl$ClusterViewListener$1.run(ClusterTopologyManagerImpl.java:590)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)

18:23:42,539 WARN  [org.infinispan.topology.ClusterTopologyManagerImpl] (remote-thread--p11-t2) ISPN000329: Unable to read rebalancing status from coordinator ws-7-aor-19211: org.infinispan.util.concurrent.TimeoutException: Node ws-7-aor-19211 timed out
    at org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher.invokeRemoteCommand(CommandAwareRpcDispatcher.java:248)
    at org.infinispan.remoting.transport.jgroups.JGroupsTransport.invokeRemotely(JGroupsTransport.java:561)
    at org.infinispan.topology.ClusterTopologyManagerImpl.fetchRebalancingStatusFromCoordinator(ClusterTopologyManagerImpl.java:129)
    at org.infinispan.topology.ClusterTopologyManagerImpl.start(ClusterTopologyManagerImpl.java:118)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:497)
    at org.infinispan.commons.util.ReflectionUtil.invokeAccessibly(ReflectionUtil.java:168)
    at org.infinispan.factories.AbstractComponentRegistry$PrioritizedMethod.invoke(AbstractComponentRegistry.java:869)
    at org.infinispan.factories.AbstractComponentRegistry.invokeStartMethods(AbstractComponentRegistry.java:638)
    at org.infinispan.factories.AbstractComponentRegistry.registerComponentInternal(AbstractComponentRegistry.java:207)
    at org.infinispan.factories.AbstractComponentRegistry.registerComponent(AbstractComponentRegistry.java:156)
    at org.infinispan.factories.AbstractComponentRegistry.getOrCreateComponent(AbstractComponentRegistry.java:277)
    at org.infinispan.factories.AbstractComponentRegistry.invokeInjectionMethod(AbstractComponentRegistry.java:227)
    at org.infinispan.factories.AbstractComponentRegistry.wireDependencies(AbstractComponentRegistry.java:132)
    at org.infinispan.remoting.inboundhandler.GlobalInboundInvocationHandler$2.run(GlobalInboundInvocationHandler.java:156)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)
Caused by: org.jgroups.TimeoutException: timeout waiting for response from ws-7-aor-19211, request: org.jgroups.blocks.UnicastRequest@75770aa6, req_id=6, mode=GET_ALL, target=ws-7-aor-19211
    at org.jgroups.blocks.MessageDispatcher.sendMessage(MessageDispatcher.java:427)
    at org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher.processSingleCall(CommandAwareRpcDispatcher.java:433)
    at org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher.invokeRemoteCommand(CommandAwareRpcDispatcher.java:241)
    ... 19 more

This is the initalization code for cachemanager:

    try {
            ctx = new InitialContext();
            cacheManager = (DefaultCacheManager)ctx.lookup(SessionConstants.CACHE_MANAGER_GLOBAL_JNDI_NAME);
        } catch (NamingException e1) {
            logger.error("SessionHooverJob not able to find: java:global/klopotekCacheManager ... a new instance will be created!");            
        }

        if (cacheManager ==null){ 

         ...
       configurator = ConfiguratorFactory.getStackConfigurator("default-configs/default-jgroups-udp.xml");
                ProtocolConfiguration udpConfiguration = configurator.getProtocolStack().get(0);
                if ("UDP".equalsIgnoreCase(udpConfiguration.getProtocolName()) && mcastAddr != null){
                    udpConfiguration.getProperties().put("mcast_addr", mcastAddr);
                }               
                GlobalConfigurationBuilder gcb = new GlobalConfigurationBuilder();
                gcb.globalJmxStatistics().enabled(true).allowDuplicateDomains(true);
                gcb.transport().defaultTransport()
                .addProperty(JGroupsTransport.CONFIGURATION_STRING, configurator.getProtocolStackString());
                //.addProperty(JGroupsTransport.CONFIGURATION_FILE, "config/jgroups.xml");

                ConfigurationBuilder builder = new ConfigurationBuilder();
                builder.clustering().cacheMode(CacheMode.DIST_SYNC).expiration().lifespan(24l, TimeUnit.HOURS);;

                cacheManager = new DefaultCacheManager(gcb.build(), 
                        builder.build());

The problem doesn't occur if a time of around 40-60 seconds passes after deploying. If i have 1 JNDI session manager which have built the jgroups channel, even if i undeploy the all the war... why jgroups try to do rebalance again?

Is there some configuration property to set?

回答1:

There's nothing wrong with using the caches from WildFly's Infinispan subsystem, even via JNDI, so long as you are aware of the lifecycle requirements/constraints of server managed Infinispan resources. In WildFly, all Infinispan resources are created/started on demand, including cache managers, cache configurations, and caches. If no service requires a given Infinispan resource, it is not started (nor is it bound to JNDI). Likewise, when no service any longer requires a given Infinispan resource, it is stopped (and its JNDI binding removed). Thus, in order to lookup an Infinispan resource via JNDI, you must first force it to start. The easiest way to do this is to create a resource reference (i.e. a resource-ref or resource-env-ref). e.g.

<resource-ref>
    <res-ref-name>infinispan/mycontainer</res-ref-name>
    <lookup-name>java:jboss/infinispan/container/mycontainer</lookup-name>
</resource-ref>

You can now lookup the cache manager in your application jndi namespace. e.g.

Context ctx = new InitialContext();
EmbeddedCacheManager manager = (EmbeddedCacheManager) ctx.lookup("java:comp/env/infinispan/mycontainer");

The cache manager will already be started. Also, you should never attempt to stop a server managed cache manager. Additionally, you cannot guarantee that any of the cache configurations that are defined within the Infinispan subsystem for this container are installed. Thus, the use of getCache("...") methods are not a reliable way of obtaining a reference to a server managed cache. If you want to depend on a specific cache as defined in the subsystem, you would create a resource reference for the cache itself. e.g.

<resource-ref>
    <res-ref-name>infinispan/mycache</res-ref-name>
    <lookup-name>java:jboss/infinispan/cache/mycontainer/mycache</lookup-name>
</resource-ref>

You can now lookup the cache directly.

Cache<?, ?> cache = (Cache) ctx.lookup("java:comp/env/infinispan/mycache");

The cache will already be started. Likewise, you should not attempt to stop a server managed cache. It will stop automatically when you application is undeployed or the server is shutdown.

回答2:

You are not supposed to use the Infinispan/JGroups libraries provided by Wildfly, and JNDI is not really the recommended way to share Cache/CacheManager instances.

Instead, you should deploy your own Infinispan/JGroups version, and then use things like CDI to inject the CacheManager where you need it. This quickstart shows you how you can do that using JBoss Data Grid, which is the supported version of Infinispan.

The repository contains other quickstarts such as this one centered on CDI injection of Infinispan Cache and JSR-107 Cache instances.