Suppose a section of an article is as follows (the html source):
<h2>Introduction</h2>
....
<h2>References</h2>
...a bunch of text...
<h2>Further Readings</h2> //optional
.....
I like to know is it possible with an XPath expression extract the "References" part in the example above?
I tried something like //h2[contains(.,'References']/following::*
, however I don't know how to specify the end of my desired section, it returns the rest of document.
if you want elements until next h2 use such xpath
//*[following-sibling::h2[preceding-sibling::h2[1][contains(.,'References')]] and preceding-sibling::h2[contains(.,'References')]]
Wath does it mean: it finds all element which has
-- ahead h2 which has the 1st preceding h2 containing 'References'
-- back h2 containing 'References'
The 1st rule takes all elements from begining of xml until next h2 tag. The 2nd -all after necessary h2 tag to end of xml. Intersection of them gives needed elements.
Or xpath maybe build on your suggestion:
//h2[.='References']/following-sibling::*[preceding-sibling::h2[1][contains(.,'References')] and not(name()='h2')]
take all after necessary h2 tag //h2[.='References']/following-sibling::*
which is not h2 and has our h2 tag as the 1st h2 before
Xpath for above text would be
//h2[text()='References']
And if you want check for
The correctness of
Above xpath
Then open webpage i chrome right click and inspect element,click ESC button to open console
Of developer tool and type like
This
$x("//h2[text()='References']") and hit enter
It will give you one html code hover on that line and see it is highlighting "References" text or not if it is highlighting the text means xpath is correct