I have a big HTML page. But I want to select certain nodes using Xpath:
<html>
........
<!-- begin content -->
<div>some text</div>
<div><p>Some more elements</p></div>
<!-- end content -->
.......
</html>
I can select HTML after the <!-- begin content -->
using:
"//comment()[. = ' begin content ']/following::*"
Also I can select HTML before the <!-- end content -->
using:
"//comment()[. = ' end content ']/preceding::*"
But do I have to have XPath to select all the HTML between the two comments?
I would look for elements that are preceded by the first comment and followed by the second comment:
Note that the above gives you each element in between. This means that if you iterate through each the returned nodes, you will get some duplicated nested nodes - eg the "Some more elements".
I think you might actually want to just get the top-level nodes in between - ie the siblings of the comments. This can be done using the
preceding/following-sibling
instead.Update - Including comments
Using
//*
only returns element nodes, which does not include comments (and some others). You could change*
tonode()
to return everything.If you just want element nodes and comments (ie not everything), you can use the
self
axis: