XPath for selecting a section of an article

Suppose a section of an article is as follows (the html source):

<h2>Introduction</h2>
  ....
<h2>References</h2>
  ...a bunch of text...
<h2>Further Readings</h2>  //optional
  .....

I like to know is it possible with an XPath expression extract the "References" part in the example above?

I tried something like //h2[contains(.,'References']/following::*, however I don't know how to specify the end of my desired section, it returns the rest of document.

标签： html xpath

2条回答

何必那么认真

2楼-- · 2020-04-01 06:59

Xpath for above text would be

//h2[text()='References']

And if you want check for The correctness of Above xpath Then open webpage i chrome right click and inspect element,click ESC button to open console Of developer tool and type like This

$x("//h2[text()='References']") and hit enter

It will give you one html code hover on that line and see it is highlighting "References" text or not if it is highlighting the text means xpath is correct

0人赞添加讨论(0) 举报

Fickle 薄情

3楼-- · 2020-04-01 07:02

if you want elements until next h2 use such xpath

//*[following-sibling::h2[preceding-sibling::h2[1][contains(.,'References')]]  and preceding-sibling::h2[contains(.,'References')]]

Wath does it mean: it finds all element which has

-- ahead h2 which has the 1st preceding h2 containing 'References'

-- back h2 containing 'References'

The 1st rule takes all elements from begining of xml until next h2 tag. The 2nd -all after necessary h2 tag to end of xml. Intersection of them gives needed elements.

Or xpath maybe build on your suggestion:

//h2[.='References']/following-sibling::*[preceding-sibling::h2[1][contains(.,'References')] and not(name()='h2')]

take all after necessary h2 tag //h2[.='References']/following-sibling::* which is not h2 and has our h2 tag as the 1st h2 before

0人赞添加讨论(0) 举报

XPath for selecting a section of an article

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间