I try to parse this xml-information:
<text:p >Lorem<text:s/>ipsum.</text:p>
Therefore I'm using XMLReader. Nearly everything is working as I need it. But the <text:s/>-element makes some trouble for me.
As I want to remove any formatting tags (i.e. bold) I'm using expand()->textContent
to get just the text:
$reader = new XMLReader();
if (!$reader->open("content.xml");
while ($reader->read()) {
if ($reader->nodeType == XMLREADER::ELEMENT && $reader->name === 'text:p') echo utf8_decode($reader->expand()->textContent);
}
In this case I would get 'Loremipsum.' instead of 'Lorem ipsum.'. How can I replace every <text:s/> with a whitespace.
Update:
I did it this way: preg_replace("/<\\/?text:s(\\s+.*?>|>)/", " ", utf8_decode($reader->readInnerXML()))
Update:
If I'm using DOMDocument for parsing, how do I have to change the syntax?
$reader = new DOMDocument();
$reader->load("zip://folder/".$file.".odt#content.xml");
while ($reader->read()){
if ($reader->nodeType == XMLREADER::ELEMENT && $reader->name === 'text:h') {
if ($reader->getAttribute('text:outline-level')=="2") $html .= '<h2>'.$reader->expand()->textContent.'</h2>';
}
elseif ($reader->nodeType == XMLREADER::ELEMENT && $reader->name === 'text:p') {
if ($reader->getAttribute('text:style-name')=="Standard") {
$str = $reader->readInnerXML();
// replace text:s-elements with " " at this point
}
}
}