XPath to select between two HTML comments is not w

2019-03-04 06:13发布

问题:

I'm trying to select some content between two HTML comments, but having some trouble getting it right (as seen in "XPath to select between two HTML comments?"). There seems to be a problem when new comments that are on the same line.

My HTML:

<html>
 ........
 <!-- begin content -->
 <div>some text</div>
 <div>
   <p>Some more elements</p>
 </div>
 <!-- end content --><!-- begin content -->
 <div>more text</div>
 <!-- end content -->
 .......
</html>

I use:

doc.xpath("//node()[preceding-sibling::comment()[. = ' begin content ']]
          [following-sibling::comment()[. = ' end content ']]")

Result:

<div>some text</div>
<div>
  <p>Some more elements</p>
</div>
<!-- end content --><!-- begin content -->
<div>more text</div>

What I'm trying to get:

<div>some text</div>
<div>
  <p>Some more elements</p>
</div>

回答1:

If you are interested in the first pair of comments, you could start with looking for the first comment:

//comment()[.=' begin content ']/following::*[not(preceding::comment()[.=' end content '])]

I.e.:

//comment()[1][.=' begin content ']           <-- look for first suitable comment
    /following::*                             <-- take all following nodes
         [not(preceding::comment()[.=' end content '])] <-- satisfying condition there is no preceding "end comment"