I want to extract text outside tags. For example,
<body>
This is an exmaple
<p>
blablabla
</p>
<references>
refer 1
refer 2
</references>
</body>
I want to get the text "This is an example" only without text in other tags (p or reference). I tried several methods but does not work. Any1 can help? Big thanks.
You must think a text inside a tag like a node. A text node is retrieved using the test node text()
. Example. Given:
<body>
This is an exmaple
<p>
blablabla
<\p>
<references>
refer 1
refer 2
<\references>
another example
<\body>
XPath:
"/body/text()"
Will retrieve all children text nodes of body
, like "This is an exmaple" and "another example", while:
"/body/text()[1]"
will retrieve only the first one, "This is an exmaple". If you want all the descendant text nodes you can use:
"/body//text()"
or, you want all the text nodes inside first p:
"/body/p[1]//text()"
Use this XPath: /body/text()
. It will select This is an exmaple
.