XPath for selecting a section of an article

2020-04-01 06:28发布

Suppose a section of an article is as follows (the html source):

<h2>Introduction</h2>
  ....
<h2>References</h2>
  ...a bunch of text...
<h2>Further Readings</h2>  //optional
  .....

I like to know is it possible with an XPath expression extract the "References" part in the example above?

I tried something like //h2[contains(.,'References']/following::*, however I don't know how to specify the end of my desired section, it returns the rest of document.

标签: html xpath
2条回答
何必那么认真
2楼-- · 2020-04-01 06:59

Xpath for above text would be

//h2[text()='References']

And if you want check for The correctness of Above xpath Then open webpage i chrome right click and inspect element,click ESC button to open console Of developer tool and type like This

$x("//h2[text()='References']") and hit enter

It will give you one html code hover on that line and see it is highlighting "References" text or not if it is highlighting the text means xpath is correct

查看更多
Fickle 薄情
3楼-- · 2020-04-01 07:02

if you want elements until next h2 use such xpath

//*[following-sibling::h2[preceding-sibling::h2[1][contains(.,'References')]]  and preceding-sibling::h2[contains(.,'References')]]

Wath does it mean: it finds all element which has

-- ahead h2 which has the 1st preceding h2 containing 'References'

-- back h2 containing 'References'

The 1st rule takes all elements from begining of xml until next h2 tag. The 2nd -all after necessary h2 tag to end of xml. Intersection of them gives needed elements.

Or xpath maybe build on your suggestion:

//h2[.='References']/following-sibling::*[preceding-sibling::h2[1][contains(.,'References')] and not(name()='h2')]

take all after necessary h2 tag //h2[.='References']/following-sibling::* which is not h2 and has our h2 tag as the 1st h2 before

查看更多
登录 后发表回答