I need to get the HTML contents of answer
in this bit of XML:
<qa>
<question>Who are you?</question>
<answer>Who who, <strong>who who</strong>, <em>me</em></answer>
</qa>
So I want to get the string "Who who, <strong>who who</strong>, <em>me</em>".
If I have the answer
as a SimpleXMLElement
, I can call asXML()
to get "<answer>Who who, <strong>who who</strong>, <em>me</em></answer>", but how to get the inner XML of an element without the element itself wrapped around it?
I'd prefer ways that don't involve string functions, but if that's the only way, so be it.
most straightforward solution is to implement custom get innerXML with simple XML:
In your code, replace
$body_content = $el->asXml();
with$body_content = simplexml_innerXML($el);
However, you could also switch to another API that offers distinction between innerXML (what you are looking for) and outerXML (what you get for now). Microsoft Dom libary offers this distinction but unfortunately PHP DOM doesn't.
I found that PHP XMLReader API offers this distintion. See readInnerXML(). Though this API has quite a different approach to processing XML. Try it.
Finally, I would stress that XML is not meant to extract data as subtrees but rather as value. That's why you running into trouble finding the right API. It would be more 'standard' to store HTML subtree as a value (and escape all tags) rather than XML subtree. Also beware that some HTML synthax are not always XML compatible ( i.e.
vs ,
). Anyway in practice, you approach is definitely more convenient for editing the xml file.
using regex you could do this
After I search for a while, I got no satisfy solution. So I wrote my own function. This function will get exact the
innerXml
content (including white-space, of course). To use it, pass the result of the functionasXML()
, like thisgetInnerXml($e->asXML())
. This function work for elements with many prefixes as well (as my case, as I could not find any current methods that do conversion on all child node of different prefixes).Output:
If you don't want to strip CDATA section, comment out lines 6-8.
To the best of my knowledge, there is not built-in way to get that. I'd recommend trying SimpleDOM, which is a PHP class extending SimpleXMLElement that offers convenience methods for most of the common problems.
Otherwise, I see two ways of doing that. The first would be to convert your
SimpleXMLElement
to aDOMNode
then loop over itschildNodes
to build the XML. The other would be to callasXML()
then use string functions to remove the root node. Attention though,asXML()
may sometimes return markup that is actually outside of the node it was called from, such as XML prolog or Processing Instructions.