Differences between // and /descendant in XPath se

2019-02-22 12:09发布

问题:

I can't clearly understand the differences between using //element and /descendant::element when selecting multiple children of a base element in XPath.

Given this HTML snippet

<html>
<body>
<div class="popupContent">
  <table>
    <tr class="aclass"><td> Hello </td> <td> <input type="text" value="FIRST" /> </td></tr>
    <tr class="aclass"><td> Goodbye </td> <td> <input type="text" value="SECOND" /> </td></tr>
  </table>
</div>
</body>
</html>

I need to select each input based on its positioning in the table. //div[@class='popupContent']//input[1] this selects the first input //div[@class='popupContent']//input[2] this gives error //div[@class='popupContent']/descendant::input[1] this again selects the first input //div[@class='popupContent']/descendant::input[2] this select the second input

Using /descendant::input does what I need: grab all inputs and let me select by position.
How does // differ? Why does it return only the first element and not the ones after?

I'm aware of this other question but the answer basically says they're aliases and point to the documentation, which I cannot understand and lacks a concrete example.

回答1:

The only difference between //X and /descendant::X is when X contains a positional predicate, for example //x[1] vs /descendant::x[1]. In this situation //x[1] selects every x element that is the first x child of its parent element, whereas /descendant::x[1] selects the first descendant x overall. You can work this out by remembering that //x[1] is short for /descendant-or-self::node()/child::x[1]



回答2:

According to XPath 1.0, §2.5 Abbreviated Syntax:

// is short for /descendant-or-self::node()/

So div[@class='popupContent']//input[1] (same as div[@class='popupContent']/descendant-or-self::node()/child::input[1]) will:

  1. go to all descendants (children, children of children and so on) of the divs with that "popupContent" class,
  2. then look for <input> children
  3. and finally select the first child of its parent ([1] predicate)

div[@class='popupContent']//input[2] is very similar except the last thing is to select the 2nd child. And none of the <input>s are 2nd child of their parent.

div[@class='popupContent']/descendant::input[2] on the other hand will:

  1. go to all descendants of the divs with that class,
  2. selecting only <input> elements, and build a node-set out of them
  3. finally select the 2nd element in that node-set, in document order

You can read about predicates and axes in §2.4 Predicates. Relevant pieces:

(...) the ancestor, ancestor-or-self, preceding, and preceding-sibling axes are reverse axes; all other axes are forward axes.

[Thus descendant is a forward axis.]

The proximity position of a member of a node-set with respect to an axis is defined to be the position of the node in the node-set ordered in document order if the axis is a forward axis (...). The first position is 1.

A predicate filters a node-set with respect to an axis to produce a new node-set. For each node in the node-set to be filtered, the PredicateExpr is evaluated with that node as the context node, with the number of nodes in the node-set as the context size, and with the proximity position of the node in the node-set with respect to the axis as the context position;