DOMDocument PHP Memory Leak

2019-01-12 00:56发布

问题:

Running PHP 5.3.6 under MAMP on MAC, the memory usage increases every x calls (between 3 and 8) until the script dies from memory exhaustion. How do I fix this?

libxml_use_internal_errors(true);
while(true){
 $dom = new DOMDocument();
 $dom->loadHTML(file_get_contents('http://www.ebay.com/'));
 unset($dom);
 echo memory_get_peak_usage(true) . '<br>'; flush();
}

回答1:

Using libxml_use_internal_errors(true); suppresses error output but builds a continuous log of errors which is appended to on each loop. Either disable the internal logging and suppress PHP warnings, or clear the internal log on each loop iteration like this:

<?php
libxml_use_internal_errors(true);
while(true){
 $dom = new DOMDocument();
 $dom->loadHTML(file_get_contents('ebay.html'));
 unset($dom);
 libxml_use_internal_errors(false);
 libxml_use_internal_errors(true);
 echo memory_get_peak_usage(true) . "\r\n"; flush();
}
?>


回答2:

Based on @Tak answer and @FrancisAvila comment, I found that this snippet works better for me:

while (true)
{
    $dom = new DOMDocument();

    if (libxml_use_internal_errors(true) === true) // previous setting was true?
    {
        libxml_clear_errors();
    }

    $dom->loadHTML(file_get_contents('ebay.html'));
}

print_r(libxml_get_errors()); // errors from the last iteration are accessible

This has the added benefits of 1) not discarding the errors of the last parse if you ever need to access them via libxml_get_errors(), and 2) calling libxml_clear_errors() only when necessary, since libxml_use_internal_errors() returns the previous setting state.



回答3:

You can try forcing the garbage collector to run with gc_collect_cycles(), but otherwise you're out of luck. PHP doesn't expose much of anything to control its internal memory usage, let alone memory used by a plugin library.



回答4:

Testing your script locally produces the same result. Changing file_get_contents() to a local HTML file however produces a consistent memory usage. It could be that the output from ebay.com is changing every X calls.