“experimental” status of JGroups Master/Slave back

2019-04-16 14:07发布

问题:

We use hibernate-search for fulltext indexing of our entities in wildfly 8.2 (using the hibernate/hibernate-search and infinspan libraries inlcuded with wildfly 8.2). Running as standalone node or in a domain with a dedicated hibernate search master and the org.hibernate.search.store.impl.FSDirectoryProvider this has been working fine for several years (and jboss versions).

We would now like to deploy this system to a HA clustered environment with wildfly 8.2 running behind a load balancing proxy. We want to have a dynamically scalable cluster without a point of failure in the sense of a domain master or a hibernate-search master and have configured for this with standalone nodes without a domain. To elect the HS master we are using the jgroups backend and to replicate the hibernate-search data we are using the infinispan provider with a file-store to persist data between restarts.

I got this up and running, rather quickly and was quite excited since it seems like a robust and scalable scenario, but I am somewhat hesitant to put this configuration into production, since the jgroups backend has been dubbed "experimental" (and in some forums "extremely experimental"). What is the current status of the backend? Are people currently using it in production? What can we do to minimize risk using this configuration?

Also, does anyone have experience with using infinispan together with hibernate-search in this constellation? Most of the settings regarding replicated-cache were simply re-used from existing examples, if anyone has some tips or advice regarding these settings, e.g will it scale with indexes ~50GB? I would be most thankful for any feedback or experience with similar scenarios.

The configuration was mostly put together using reference material from here:

  • http://docs.jboss.org/hibernate/stable/search/reference/en-US/html_single/
  • https://forum.hibernate.org/viewtopic.php?f=9&t=1035437

The detailed steps we have taken are included below.

  • As a basis, we taken and extended the standalone-ha-full.xml
  • configure jgroups to use the TCP stack
  • run TCPPING instead of MPing (we are planning to have this running in a cloud context where multicast/udp causes issues - we may move to JDBCPing to make this more flexible at some point).
  • we run with the following System-Properties per node (name/port changes per node, of course)

System-properties:

<system-properties>        
   <property name="jboss.node.name" value="node2" />  
   <property name="jboss.socket.binding.port-offset" value="889" />  
   <!-- Automatic master election via JGroups, requires Infinispan directory provider -->
   <property name="hibernate.search.default.worker.backend" value="jgroups"/>
   <!-- Enable cluster-replicated index, but the default configuration does not enable any 
   form of permanent persistence for the index, we do this with cache-container/file-store below  -->
   <property name="hibernate.search.default.directory_provider" value="infinispan" />
   <property name="hibernate.search.infinispan.chunk_size" value="300000000" />
   <property name="hibernate.search.reader.strategy" value="shared" />
   <property name="hibernate.search.worker.execution" value="sync" />
   <property name="hibernate.search.default.optimizer.operation_limit.max"    value="10000"/>
   <property name="hibernate.search.default.optimizer.transaction_limit.max"    value="1000"/>
   <!-- Use CacheManager defined in WildFly configuration file, e.g., standalone.xml -->
   <property name="hibernate.search.infinispan.cachemanager_jndiname" value="java:jboss/infinispan/container/hibernate-search"/>
</system-properties>     

We have defined the following <cache-container> for infinispan:

<!-- BEGIN HIBERNATE INFINISPAN CACHE -->
<cache-container name="hibernate-search" jndi-name="java:jboss/infinispan/container/hibernate-search" start="EAGER">
    <transport lock-timeout="330000"/>
    <replicated-cache name="LuceneIndexesMetadata" start="EAGER" mode="SYNC" remote-timeout="330000">
        <locking striping="false" acquire-timeout="330000" concurrency-level="500"/>
        <transaction mode="NONE"/>
        <eviction strategy="NONE" max-entries="-1"/>
        <expiration max-idle="-1"/>
        <state-transfer enabled="true" timeout="480000"/>
        <file-store preload="true" purge="false" passivation="false" relative-to="jboss.home.dir" path="..\namespaces\mc\infinispan-file-store">
            <write-behind/>
        </file-store>
        <indexing index="NONE"/>
    </replicated-cache>
    <replicated-cache name="LuceneIndexesData" start="EAGER" mode="SYNC" remote-timeout="25000">
        <locking striping="false" acquire-timeout="330000" concurrency-level="500"/>
        <transaction mode="NONE"/>
        <eviction strategy="NONE" max-entries="-1"/>
        <expiration max-idle="-1"/>
        <state-transfer enabled="true" timeout="480000"/>
       <file-store preload="true" purge="false" passivation="false" relative-to="jboss.home.dir" path="..\namespaces\mc\infinispan-file-store">
            <write-behind/>
        </file-store>
        <indexing index="NONE"/>
    </replicated-cache>
    <replicated-cache name="LuceneIndexesLocking" start="EAGER" mode="SYNC" remote-timeout="25000">
        <locking striping="false" acquire-timeout="330000" concurrency-level="500"/>
        <transaction mode="NONE"/>
        <eviction strategy="NONE" max-entries="-1"/>
        <expiration max-idle="-1"/>
        <state-transfer enabled="true" timeout="480000"/>
        <indexing index="NONE"/>
    </replicated-cache>
</cache-container>
<!-- END HIBERNATE INFINISPAN CACHE -->

It is my understanding (and it seems to work in practice with my tests) that infinispan will serialize its data to the configured <file-store> and the data is then retained between node restarts. Even some catastrophic tests (e.g. kill -9 <jboss-pid>) have shown to recover the index cleanly when the node comes back up. During the offline period, another node takes over as the master and the cluster runs smoothly.