I am using Spring Batch for data migration from XML to Oracle Database.
With Single Thread execution, process takes 80-90 Mins to insert 20K users approx.
I want to reduce it to more than half but even using Multi File Resource, I am not able to achieve that.
I have a single XML to be processed so I started simply by adding
- task executor and making Reader synchronized but not able to achieve gain.
So what I am doing, I split XML into multiple XMLS and want to try with Multi File Resource. Here is the configuration.
<batch:job id="importJob">
<batch:step id="step1Master">
<batch:partition handler="handler" partitioner="partitioner" />
</batch:step>
</batch:job>
<bean id="handler"
class="org.springframework.batch.core.partition.support.TaskExecutorPartitionHandler">
<property name="taskExecutor" ref="taskExecutor" />
<property name="step" ref="slaveStep" />
<property name="gridSize" value="20" />
</bean>
<batch:step id="slaveStep">
<batch:tasklet transaction-manager="transactionManager"
allow-start-if-complete="true">
<batch:chunk reader="reader" writer="writer"
processor="processor" commit-interval="1000" skip-limit="1500000">
<batch:skippable-exception-classes>
<batch:include class="java.lang.Exception" />
</batch:skippable-exception-classes>
</batch:chunk>
</batch:tasklet>
</batch:step>
<bean id="taskExecutor"
class="org.springframework.scheduling.concurrent.ThreadPoolTaskExecutor">
<property name="corePoolSize" value="100" />
<property name="maxPoolSize" value="300" />
<property name="allowCoreThreadTimeOut" value="true" />
</bean>
<bean id="partitioner"
class="org.springframework.batch.core.partition.support.MultiResourcePartitioner"
scope="step">
<property name="keyName" value="inputFile" />
<property name="resources"
value="file:/.../*.xml" />
</bean>
<bean id="processor"
class="...Processor"
scope="step" />
<bean id="reader" class="org.springframework.batch.item.xml.StaxEventItemReader"
scope="step">
<property name="fragmentRootElementName" value="user" />
<property name="unmarshaller" ref="userDetailUnmarshaller" />
<property name="resource" value="#{stepExecutionContext[inputFile]}" />
</bean>
My Single XML file contains users around 1000 and I am trying by having 20 files.
I kept commit-interval=1000 as each file has 1000 records to be insert in DB.
Do commit-interval needs to adjusted accordingly?
I am using ORACLE DB, Do I need to do any pool management there. Current Pool of ORACLE DB configured in JBOSS Min Pool = 100 Max Pool = 300
I see logging like
17:01:50,553 DEBUG [Writer] (taskExecutor-11) [UserDetailWriter] | user added
17:01:50,683 DEBUG [Writer] (taskExecutor-15) [UserDetailWriter] | user added
17:01:51,093 DEBUG [Writer] (taskExecutor-11) [UserDetailWriter] | user added
17:01:59,795 DEBUG [Writer] (taskExecutor-12) [UserDetailWriter] | user added
17:02:00,385 DEBUG [Writer] (taskExecutor-12) [UserDetailWriter] | user added
17:02:00,385 DEBUG [Writer] (taskExecutor-12) [UserDetailWriter] | user added
It seems multiple threads are being created but still I am not seeing any performance improvement here?
Please suggest what I am doing wrong?