how to extract text outside tags xml

2020-04-07 05:09发布

问题:

I want to extract text outside tags. For example,

<body>
    This is an exmaple
    <p>
        blablabla
    </p>
    <references>
        refer 1
        refer 2
    </references>
</body>

I want to get the text "This is an example" only without text in other tags (p or reference). I tried several methods but does not work. Any1 can help? Big thanks.

回答1:

You must think a text inside a tag like a node. A text node is retrieved using the test node text(). Example. Given:

<body>
    This is an exmaple
    <p>
    blablabla
    <\p>
    <references>
        refer 1
        refer 2
    <\references>
    another example
<\body>

XPath:

"/body/text()"

Will retrieve all children text nodes of body, like "This is an exmaple" and "another example", while:

"/body/text()[1]"

will retrieve only the first one, "This is an exmaple". If you want all the descendant text nodes you can use:

"/body//text()"

or, you want all the text nodes inside first p:

"/body/p[1]//text()"


回答2:

Use this XPath: /body/text(). It will select This is an exmaple.