I am building an RSS parser using the SimpleXML Class and I was wondering if using the DOMDocument class would improve the speed of the parser. I am parsing an rss document that is at least 1000 lines and I use almost all of the data from those 1000 lines. I am looking for the method that will take the least time to complete.
问题:
回答1:
SimpleXML
and DOMDocument
both use the same parser (libxml2
), so the parsing difference between them is negligible.
This is easy to verify:
function time_load_dd($xml, $reps) {
// discard first run to prime caches
for ($i=0; $i < 5; ++$i) {
$dom = new DOMDocument();
$dom->loadXML($xml);
}
$start = microtime(true);
for ($i=0; $i < $reps; ++$i) {
$dom = new DOMDocument();
$dom->loadXML($xml);
}
$stop = microtime(true) - $start;
return $stop;
}
function time_load_sxe($xml, $reps) {
for ($i=0; $i < 5; ++$i) {
$sxe = simplexml_load_string($xml);
}
$start = microtime(true);
for ($i=0; $i < $reps; ++$i) {
$sxe = simplexml_load_string($xml);
}
$stop = microtime(true) - $start;
return $stop;
}
function main() {
// This is a 1800-line atom feed of some complexity.
$url = 'http://feeds.feedburner.com/reason/AllArticles';
$xml = file_get_contents($url);
$reps = 10000;
$methods = array('time_load_dd','time_load_sxe');
echo "Time to complete $reps reps:\n";
foreach ($methods as $method) {
echo $method,": ",$method($xml,$reps), "\n";
}
}
main();
On my machine I get basically no difference:
Time to complete 10000 reps:
time_load_dd: 17.725028991699
time_load_sxe: 17.416455984116
The real issue here is what algorithms you are using and what you are doing with the data. 1000 lines is not a big XML document. Your slowdown will not be in memory usage or parsing speed but in your application logic.
回答2:
Well, I have encountered a HUGE performance difference between DomDocument
and SimpleXML
. I have ~ 15 MB big XML file with approx 50 000 elements like this:
...
<ITEM>
<Product>some product code</Product>
<Param>123</Param>
<TextValue>few words</TextValue>
</ITEM>
...
I only need to "read" those values and save them in PHP array. At first I tried DomDocument
...
$dom = new DOMDocument();
$dom->loadXML( $external_content );
$root = $dom->documentElement;
$xml_param_values = $root->getElementsByTagName('ITEM');
foreach ($xml_param_values as $item) {
$product_code = $item->getElementsByTagName('Product')->item(0)->textContent;
// ... some other operation
}
That script died after 60 seconds with maximum execution time exceeded error. Only 15 000 items of 50k were parsed.
So I rewrote the code to SimpleXML
version:
$xml = new SimpleXMLElement($external_content);
foreach($xml->xpath('ITEM') as $item) {
$product_code = (string) $item->Product;
// ... some other operation
}
After 1 second all was done.
I don't know how those functions are internally implemented in PHP, but in my application (and with my XML structure) there is really, REALLY HUGE performance difference between DomDocument
and SimpleXML
.