XSLTProcessor xmlSAX2Characters: out of memory

2019-05-03 21:12发布

问题:

I have a page which load a 500 mb xml file and parses the file using an xsl template. The parser works perfectly in my local environment. I am using WAMP.

On the web server.

Warning: DOMDocument::load() [domdocument.load]: (null)xmlSAX2Characters: out of memory in /home/mydomain/public_html/xslt/largeFile.xml, line: 2031052 in /home/mydomain/public_html/xslt/parser_large.php on line 6

My Code is as below, line 6 loads the xml file

<?php
$xslDoc = new DOMDocument();
$xslDoc->load("template.xslt");

$xmlDoc = new DOMDocument();
$xmlDoc->load("largeFile.xml");

$proc = new XSLTProcessor();
$proc->importStylesheet($xslDoc);
echo $proc->transformToXML($xmlDoc);
?>

I have tried copying the php.ini file from the wamp installation to the folder where the above code is located. But this has not helped. The memory limit in this php.ini file is memory_limit = 1000M

Any advice / experience on this would be greatly appreciated

回答1:

Here is the sad truth. There are two basic ways of working with XML, DOM-based, where the whole XML file is present in memory at once (with considerable overhead to make it fast to traverse), and SAX based where the file goes through memory, but only a small portion of it is present at any given time.

However, with DOM, large memory consumption is pretty much normal.

Now XSLT language in general allows constructions that access any parts of the whole file at any time and it therefore requires the DOM style. Some programming languages have libraries that allow feeding SAX input into an XSLT processor, but this necessarily implies restrictions on the XSLT language or memory consumption not much better than that of DOM. PHP does not have a way of making XSLT read SAX input, though.

That leaves us with alternatives to DOM; there is one, and is called SimpleXML. SimpleXML is is a little tricky to use if your document has namespaces. An ancient benchmark seems to indicate that it is somewhat faster, and probably also less wasteful with memory consumption, than DOM on large files.

And finally, I was in your shoes once in another programming language. The solution was to split the document into small ones based on simple rules. Each small document contained a header copied from the whole document, one "detail" element and a footer, making its format valid against the big XML file's schema. It was processed using XSLT (assuming that processing of one detail element does not look into any other detail element) and the outputs combined. This works like charm but it is not implemented in seconds.

So, here are your options. Choose one.

  • Parse and process XML using SAX.
  • Use SimpleXML and hope that it will allow slightly larger files within the same memory.
  • Execute an external XSLT processor and hope that it will allow slightly larger files within the same memory.
  • Split and merge XML using this method and apply XSLT on small chunks only. This method is only practical with some schemas.