Replacing Parsing function with XMLReader in Speci

2019-03-03 15:02发布

I'm trying to utilize a PHP script to parse a large XML file (around 450 MB) to MYSQL database into certain structure and definitions to included XML elements. The problem is that the original script uses file_get_contents and SimpleXMLElement to get it done, but the corn job executed by the server halts due to the volume of the XML file. I'm no PHP expert, so I bought this XMLSplit software and divided the XML into 17 separated XML files each at size of 30 MB, parsed them one by one using the same script. However, the output database was missing a lot of input, and I have serious doubts whether this would be the same output of the original file if left not divided automatically and parsed one by one.

So, I've decided to use XMLReader with this exact PHP script to parse this big XML file, but so far I couldn't manage to simply replace the parsing code and keep other functionality intact.

I'm including the script below, I'd really appreciate if someone helps me to do so.

<?php
set_time_limit(0);
ini_set('memory_limit', '1024M');

include_once('../db.php');
include_once(DOC_ROOT.'/include/func.php');

mysql_query("TRUNCATE screenshots_list");
mysql_query("TRUNCATE pages");
mysql_query("TRUNCATE page_screenshots");


$xmlstr = file_get_contents('t_info.xml');
$xml = new SimpleXMLElement($xmlstr);
foreach ($xml->template as $item)

{
//print_r($item);
$sql = sprintf("REPLACE INTO templates SET id = %d, state = %d, price = %d,           exc_price = %d, inserted_date = '%s', update_date = '%s', downloads = %d, type_id = %d, type_name = '%s', is_flash = %d, is_adult = %d, width = '%s', author_id = %d, author_nick = '%s', package_id = %d, is_full_site = %d, is_real_size = %d, keywords = '%s', sources = '%s', description = '%s', software_required = '%s'", $item->id, $item->state, $item->price, $item->exc_price, $item->inserted_date, $item->update_date, $item->downloads, $item->template_type->type_id, $item->template_type->type_name, $item->is_flash, $item->is_adult, $item->width, $item->author->author_id, $item->author->author_nick, $item->package->package_id, $item->is_full_site, $item->is_real_size, $item->keywords, $item->sources, $item->description, $item->software_required);
//echo '<br>'.$sql;
mysql_query($sql);
//print_r($item->screenshots_list->screenshot);
foreach ($item->screenshots_list->screenshot as $scr) {
    $main = (!empty($scr->main_preview)) ? 1 : 0;
    $small = (!empty($scr->small_preview)) ? 1 : 0;
    insert_data($item->id, 'screenshots_list', 0, $scr->uri, $scr->filemtime, $main, $small);       
}
foreach ($item->styles->style as $st) {
    insert_data($item->id, 'styles', $st->style_id, $st->style_name);       
}
foreach ($item->categories->category as $cat) {
    insert_data($item->id, 'categories', $cat->category_id, $cat->category_name);       
}
foreach ($item->sources_available_list->source as $so) {
    insert_data($item->id, 'sources_available_list', $so->source_id, '');       
}
foreach ($item->software_required_list->software as $soft) {
    insert_data($item->id, 'software_required_list', $soft->software_id, '');       
}
//print_r($item->pages->page);
if (!empty($item->pages->page)) {
    foreach ($item->pages->page as $p) {
        mysql_query(sprintf("REPLACE INTO pages SET tpl_id = %d, name = '%s', id = NULL ", $item->id, $p->name));
        $page_id = mysql_insert_id();
        if (!empty($p->screenshots->scr)) {
            foreach ($p->screenshots->scr as $psc) {
                $href = (!empty($psc->href)) ? (string)$psc->href : '';
                mysql_query(sprintf("REPLACE INTO page_screenshots SET page_id = %d, description = '%s', uri = '%s', scr_type_id = %d, width = %d, height = %d, href = '%s'", $page_id, $psc->description, $psc->uri, $psc->scr_type_id, $psc->width, $psc->height, $href));
            }
        }
    }
}}?>

To highlight the code lines in question, This is the part where I'm trying to replace with XMLReader method without affecting the functionality of rest of the script:

   $xmlstr = file_get_contents('t_info.xml');
   $xml = new SimpleXMLElement($xmlstr);
   foreach ($xml->template as $item) {

I'd really appreciate your solutions...

标签: php xml parsing
1条回答
可以哭但决不认输i
2楼-- · 2019-03-03 15:14

It is possible to expand a XML reader position to a DOMElement. This element is not associated with an DOMDocument, so it can not be converted into a SimpleXMLElement directly, but it can be imported into a DOMDocument.

$xml = <<<'XML'
<templates>
  <template>
     <styles>
        <style>TEST</style>
     </styles>
  </template>
</templates>
XML;

$reader = new XMLReader;
$reader->open('data://text/xml;base64,'.base64_encode($xml));

$dom = new DOMDocument;

// look for the first template element
while ($reader->read() && $reader->localName !== 'template') {
  continue;
}

// while you have an template element
while ($reader->localName === 'template') {
  // convert to SimpleXMLElement
  $element = simplexml_import_dom(
    // expand to a DOMElement in the prepared document object
    $reader->expand($dom)
  );
  var_dump(
    $element 
  );
  // move to the next template sibling
  $reader->next('template');
}

Output:

object(SimpleXMLElement)#3 (1) {
  ["styles"]=>
  object(SimpleXMLElement)#4 (1) {
    ["style"]=>
    string(4) "TEST"
  }
}

I usually use DOM+Xpath and do not convert it to SimpleXML, but this approach should work fine for your problem.

查看更多
登录 后发表回答