Let's say I have an XML that looks like this:
<a>
<b>
<![CDATA[some text]]>
<c>xxx</c>
<d>yyy</d>
</b>
</a>
I can't find a way to get "some text". Any idea?
If I'm using "a/b" it returns also xxx and yyy
If I'm using "a/b/text()" it returns nothing
You can't actually select a CDATA section: CDATA is just a way of telling the parser to avoid unescaping special characters, and your input document looks to XPath exactly the same as:
<a>
<b>
some text
<c>xxx</c>
<d>yyy</d>
</b>
</a>
(Having said that, if you're using DOM, then some DOM XPath engines fail to implement the spec correctly, and treat the CDATA content as a separate text node from the text outside the CDATA section).
The XPath expression a/b/text() should select three text nodes, of which the first contains "some text" along with surrounding whitespace.
With the XPath data model the path /a/b/text()[1]
should select a text node with the string value
some text
that is a line break, some spaces, the text some text
followed by a line break and some spaces.