SimpleXML vs DOMDocument performance

2019-02-07 00:28发布

I am building an RSS parser using the SimpleXML Class and I was wondering if using the DOMDocument class would improve the speed of the parser. I am parsing an rss document that is at least 1000 lines and I use almost all of the data from those 1000 lines. I am looking for the method that will take the least time to complete.

2条回答
一夜七次
2楼-- · 2019-02-07 00:50

SimpleXML and DOMDocument both use the same parser (libxml2), so the parsing difference between them is negligible.

This is easy to verify:

function time_load_dd($xml, $reps) {
    // discard first run to prime caches
    for ($i=0; $i < 5; ++$i) { 
        $dom = new DOMDocument();
        $dom->loadXML($xml);
    }
    $start = microtime(true);
    for ($i=0; $i < $reps; ++$i) { 
        $dom = new DOMDocument();
        $dom->loadXML($xml);
    }
    $stop = microtime(true) - $start;
    return $stop;
}
function time_load_sxe($xml, $reps) {
    for ($i=0; $i < 5; ++$i) { 
        $sxe = simplexml_load_string($xml);
    }
    $start = microtime(true);
    for ($i=0; $i < $reps; ++$i) { 
        $sxe = simplexml_load_string($xml);
    }
    $stop = microtime(true) - $start;
    return $stop;
}


function main() {
    // This is a 1800-line atom feed of some complexity.
    $url = 'http://feeds.feedburner.com/reason/AllArticles';
    $xml = file_get_contents($url);
    $reps = 10000;
    $methods = array('time_load_dd','time_load_sxe');
    echo "Time to complete $reps reps:\n";
    foreach ($methods as $method) {
        echo $method,": ",$method($xml,$reps), "\n";
    }
}
main();

On my machine I get basically no difference:

Time to complete 10000 reps:
time_load_dd: 17.725028991699
time_load_sxe: 17.416455984116

The real issue here is what algorithms you are using and what you are doing with the data. 1000 lines is not a big XML document. Your slowdown will not be in memory usage or parsing speed but in your application logic.

查看更多
闹够了就滚
3楼-- · 2019-02-07 00:54

Well, I have encountered a HUGE performance difference between DomDocument and SimpleXML. I have ~ 15 MB big XML file with approx 50 000 elements like this:

...
<ITEM>
  <Product>some product code</Product>
  <Param>123</Param>
  <TextValue>few words</TextValue>
</ITEM>
...

I only need to "read" those values and save them in PHP array. At first I tried DomDocument ...

$dom = new DOMDocument();
$dom->loadXML( $external_content );
$root = $dom->documentElement; 

$xml_param_values = $root->getElementsByTagName('ITEM');
foreach ($xml_param_values as $item) {
    $product_code = $item->getElementsByTagName('Product')->item(0)->textContent;
    // ... some other operation
}

That script died after 60 seconds with maximum execution time exceeded error. Only 15 000 items of 50k were parsed.

So I rewrote the code to SimpleXML version:

$xml = new SimpleXMLElement($external_content);
foreach($xml->xpath('ITEM') as $item) {
    $product_code = (string) $item->Product;
    // ... some other operation
}

After 1 second all was done.

I don't know how those functions are internally implemented in PHP, but in my application (and with my XML structure) there is really, REALLY HUGE performance difference between DomDocument and SimpleXML.

查看更多
登录 后发表回答