Use of text() function when using xPath in dom4j

2019-09-02 08:36发布

问题:

I have inherited an application that parses xml using dom4j and xPath:

The xml being parsed is similar to the following:

<cache>
  <content>
    <transaction>
      <page>
        <widget name="PAGE_ID">WRK_REGISTRATION</widget>
        <widget name="TRANS_DETAIL_ID">77145</widget>
        <widget name="GRD_ERRORS" />
      </page>
      <page>
        <widget name="PAGE_ID">WRK_REGISTRATION</widget>
        <widget name="TRANS_DETAIL_ID">77147</widget>
        <widget name="GRD_ERRORS" />
      </page>
      <page>
        <widget name="PAGE_ID">WRK_PROCESSING</widget>
        <widget name="TRANS_DETAIL_ID">77152</widget>
        <widget name="GRD_ERRORS" />
      </page>
    </transaction>
  </content>
</cache>

Individual Nodes are being searched using the following:

String xPathToGridErrorNode = "//cache/content/transaction/page/widget[@name='PAGE_ID'][text()='WRK_DNA_REGISTRATION']/../widget[@name='TRANS_DETAIL_ID'][text()='77147']/../widget[@name='GRD_ERRORS_TEMP']";

org.dom4j.Element root = null;

SAXReader reader = new SAXReader();
Document document = reader.read(new BufferedInputStream(new ByteArrayInputStream(xmlToParse.getBytes())));
root = document.getRootElement();

Node gridNode = root.selectSingleNode(xPathToGridErrorNode);

where xmlToParse is a String of xml similar to the excerpt provided above.

The code is trying to obtain the GRD_ERROR node for the page with the PAGE_ID and TRANS_DETAIL_ID provided in the xPath.

I am seeing an intermittent (~1-2%) failure (returned node is null) of this selectSingleNode request even though the requested node is in the xml being searched.

I know there are some gotchas associated with using text()= in xPath and was wondering if there was a better way to format the xPath string for this type of search.

回答1:

From your snippets, there is a problem regarding GRD_ERRORS vs. GRD_ERRORS_TMP and WRK_REGISTRATION vs. WRK_DNA_REGISTRATION.

Ignoring that, I would suggest to rewrite

//cache/content/transaction/page
  /widget[@name='PAGE_ID'][text()='WRK_DNA_REGISTRATION']
  /../widget[@name='TRANS_DETAIL_ID'][text()='77147']
  /../widget[@name='GRD_ERRORS_TEMP']

as

//cache/content/transaction/page
  [widget[@name='PAGE_ID'][text()='WRK_REGISTRATION']]
  [widget[@name='TRANS_DETAIL_ID'][text()='77147']]
  /widget[@name='GRD_ERRORS']

Just because it makes the code, in my eyes, easier to read, and expresses what you seem to mean more clearly: “the page element that has children with these conditions, and then take the widget with this @name.” Or, if that is closer to how you think about it,

//cache/content/transaction/page/widget[@name='GRD_ERRORS']
  [preceding-sibling::widget[@name='PAGE_ID'][text()='WRK_REGISTRATION']]
  [preceding-sibling::widget[@name='TRANS_DETAIL_ID'][text()='77147']]


标签: xpath dom4j