After learning how to "correctly" unset a node, I noticed that using PHP's unset() function leaves the tabs and spaces behind. So now I have this big chunk of white space in between nodes at times. I'm wondering if PHP iterates through blank spaces/returns/tabs and whether it would eventually slow down the system.
I'm also asking whether there's an easy to remove the space unset leaves behind?
Thanks,
Ryan
ADDED NOTE:
This is how I removed the whitespaces after unsetting a node and it worked for me.
$dom = new DOMDocument();
$dom->preserveWhiteSpace = false;
$dom->formatOutput = true;
$dom->load($xmlPath);
$dom->save($xmlPath);
Wether it slows down the process: probably to little to care about.
And simpleXML is just that, simple. If you require a 'pretty' output, DOM is your friend:
<?php
$xml = '
<xml>
<node>foo </node>
<other>bar</other>
</xml>';
$x = new SimpleXMLElement($xml);
unset($x->other);
echo $x->asXML();
$dom = new DOMDocument();
$dom->preserveWhiteSpace = false;
$dom->formatOutput = true;
$dom->loadXML($xml);
$dom->documentElement->removeChild($dom->documentElement->lastChild);
echo $dom->saveXML();
Whitespace in XML is TextNodes, e.g.
<foo>
<bar>baz</bar>
</foo>
is really
<foo><- whitespace node
-><bar>baz</bar><- whitespace node
-></foo>
If you remove the <bar>
node, you get
<foo><- whitespace node
-><- whitespace node
-></foo>
I think SimpleXml wont allow you to access the Text nodes easily (maybe via XPath) but DOM does. See Wrikken's answer for details. Now that you know that whitespace is a node, you can also imagine that parsing it into a node takes up some cpu cycles. However, I'd say the speed impact is negliglible. When in doubt, do a benchmark with some real world data.
EDIT: Proof that whitespace is really nodes
$xml = <<< XML
<foo>
<bar>baz</bar>
</foo>
XML;
$dom = new DOMDocument;
$dom->loadXML($xml);
foreach($dom->documentElement->childNodes as $node) {
var_dump($node);
}
gives
object(DOMText)#4 (0) {}
object(DOMElement)#6 (0) {}
object(DOMText)#4 (0) {}
It's actually Libxml that does the XML parsing, whitespace is read by the parser the same as every other character in the input stream (or file). Most of the PHP xml APIs use Libxml under the hood (XmlReader, XmlWriter, SimpleXml Xslt, Dom...) - some of them give you access to whitespace (e.g. Dom, XmlReader), some don't (e.g. SimpleXML)
Quick answers to the questions asked:
I'm wondering if PHP iterates through
blank spaces/returns/tabs and whether
it would eventually slow down the
system.
No, PHP (or libxml) doesn't really iterate over it. Having more whitespace theorically slows down the system, although it's so small it can't be measured directly. You could test that by yourself by removing all whitespace from your XML. It wouldn't make it faster.
I'm also asking whether there's an
easy to remove the space unset leaves
behind?
No easy way I'm afraid. You can import your SimpleXML stuff to DOM and use formatOutput
to completely remodel the whitespace, as suggested in another answer, or you can use a third party library that will do it for you, but you won't find an easy, built-in way to do that.