Spring Batch Stax XML reading job is not ending wh

2019-04-14 06:39发布

问题:

I'm using Spring Batch to set up a job that will process a potentially very large XML file. I think I've set it up appropriately, but at runtime I'm finding that the job runs, processes its input, and then just hangs in an executing state (I can confirm by viewing the JobExecution's status in the JobRepository).

I've read through the Batch documentation several times but I don't see any obvious "make the job stop when out of input" configuration that I'm missing.

Here's the relevant portion of my application context:

<batch:job id="processPartnerUploads" restartable="true">
    <batch:step id="processStuffHoldings">
        <batch:tasklet>
            <batch:chunk reader="stuffReader" writer="stuffWriter" commit-interval="1"/>
        </batch:tasklet>        
    </batch:step>
</batch:job>

<bean id="stuffReader" class="org.springframework.batch.item.xml.StaxEventItemReader">
  <property name="fragmentRootElementName" value="stuff" />
  <property name="resource" value="file:///path/to/file.xml" />
  <property name="unmarshaller" ref="stuffUnmarshaller" />
</bean>

<bean id="stuffUnmarshaller" class="org.springframework.oxm.jaxb.Jaxb2Marshaller">
    <property name="contextPath" value="com.company.project.xmlcontext"/>
</bean>

<bean id="stuffWriter" class="com.company.project.batch.StuffWriter" />

In case it matters, the "StuffWriter" is just a class that logs the items that would be written.

Please let me know if I've missed some important nuance involved with Batch and/or Stax.

回答1:

I've resolved this problem for myself, though I'm surprised by what I had to do. Debugging through StaxEventItemReader, I noticed that the inner loop in the moveCursorToNextFragment() method would go infinite when the end of my document was reached. Here's the relevant code:

while (true) {
    while (reader.peek() != null && !reader.peek().isStartElement()) {
        reader.nextEvent();
    }
    if (reader.peek() == null) {
        return false;
    }
    QName startElementName = ((StartElement) reader.peek()).getName();
    if (startElementName.getLocalPart().equals(fragmentRootElementName)) {
        if (fragmentRootElementNameSpace == null
    || startElementName.getNamespaceURI().equals(fragmentRootElementNameSpace)) {
           return true;
        }
     }
    reader.nextEvent();
 }

reader.peek() was never returning null. It seemed to me like this code should be checking to see if the XMLEvent encountered during peek() is at the end of the document, but this wasn't so simple due to the StaxEventItemReader's reliance on a DefaultFragmentEventReader wrapping the standard XMLEventReader.

What I wound up doing was rolling my own ItemReader based on StaxEventItemReader but without using a FragmentEventReader at all, and then adjusting the inner loop code to read like so:

        if (reader.peek().getEventType() == XMLStreamConstants.END_DOCUMENT) {
            return false;
        }
        reader.nextEvent();

That works perfectly and allows my batch job to go to COMPLETED at the end of input.

I'm really surprised that I had to do this, though. I wondered if the underlying implementation of the streaming XML libraries I was using was at fault, but I'm using stax2-api-3.0.1.jar as referenced in the Spring Batch dependency list.

I also found that I'm not alone.