XPath for selecting a section of an article

2020-04-01 06:28发布

问题:

Suppose a section of an article is as follows (the html source):

<h2>Introduction</h2>
  ....
<h2>References</h2>
  ...a bunch of text...
<h2>Further Readings</h2>  //optional
  .....

I like to know is it possible with an XPath expression extract the "References" part in the example above?

I tried something like //h2[contains(.,'References']/following::*, however I don't know how to specify the end of my desired section, it returns the rest of document.

回答1:

if you want elements until next h2 use such xpath

//*[following-sibling::h2[preceding-sibling::h2[1][contains(.,'References')]]  and preceding-sibling::h2[contains(.,'References')]]

Wath does it mean: it finds all element which has

-- ahead h2 which has the 1st preceding h2 containing 'References'

-- back h2 containing 'References'

The 1st rule takes all elements from begining of xml until next h2 tag. The 2nd -all after necessary h2 tag to end of xml. Intersection of them gives needed elements.

Or xpath maybe build on your suggestion:

//h2[.='References']/following-sibling::*[preceding-sibling::h2[1][contains(.,'References')] and not(name()='h2')]

take all after necessary h2 tag //h2[.='References']/following-sibling::* which is not h2 and has our h2 tag as the 1st h2 before



回答2:

Xpath for above text would be

//h2[text()='References']

And if you want check for The correctness of Above xpath Then open webpage i chrome right click and inspect element,click ESC button to open console Of developer tool and type like This

$x("//h2[text()='References']") and hit enter

It will give you one html code hover on that line and see it is highlighting "References" text or not if it is highlighting the text means xpath is correct



标签: html xpath