SimpleXML & PHP: Extract part of XML document & co

2019-06-02 04:49发布

问题:

Consider the following XML:

<?xml version="1.0" encoding="UTF-8"?>
<OS>
    <data>
        <OSes>
            <centos>
                <v_5>
                    <i386>
                        <id>centos5-32</id>
                        <name>CentOS 5 - 32 bit</name>
                        <version>5</version>
                        <architecture>32</architecture>
                        <os>centos</os>
                    </i386>
                    <x86_64>
                        <id>centos5-64</id>
                        <name>CentOS 5 - 64 bit</name>
                        <version>5</version>
                        <architecture>64</architecture>
                        <os>centos</os>
                    </x86_64>
                </v_5>
                <v_6>
                    <i386>
                        <id>centos6-32</id>
                        <name>CentOS 6 - 32 bit</name>
                        <version>6</version>
                        <architecture>32</architecture>
                        <os>centos</os>
                    </i386>
                    <x86_64>
                        <id>centos6-64</id>
                        <name>CentOS 6 - 64 bit</name>
                        <version>6</version>
                        <architecture>64</architecture>
                        <os>centos</os>
                    </x86_64>
                </v_6>
            </centos>
            <ubuntu>
                <v_10>
                    <i386>
                        <id>ubuntu10-32</id>
                        <name>Ubuntu 10 - 32 bit</name>
                        <version>10</version>
                        <architecture>32</architecture>
                        <os>ubuntu</os>
                    </i386>
                    <amd64>
                        <id>ubuntu10-64</id>
                        <name>Ubuntu 10 - 64 bit</name>
                        <version>10</version>
                        <architecture>64</architecture>
                        <os>ubuntu</os>
                    </amd64>
                </v_10>
            </ubuntu>
        </OSes>
    </data>
</OS>

From the XML document above, I want to extract following 5 element node

  1. <id>
  2. <name>
  3. <version>
  4. <architecture>
  5. <os>

And have them as a array. I tried doing the following:

<?php 
require_once "xml.php";

    try {
        $xml = new SimpleXMLElement($xmlstr);
        foreach($xml->xpath(' //id | //name | //version// | //architecture | //os ') as $record) {
        echo $record;
    }
    } catch(Exception $e){
        echo $e->getMessage();
    }

the above code works but each record is an separate object. I want someone to consolidate all 5 elements nodes as one array element. something like this:

$osList = Array( [0] => Array(
                               ["id"] => "<id>",
                               ["name"] => "<name>",
                               ["version"] => "<version>",
                               ....
)
 .....
);

syntax isn't correct but you get the idea. any idea how to do this?

回答1:

this might help

$obj = new SimpleXMLElement($xml);
$rtn = array();
$cnt = 0;
foreach($obj->xpath('///OSes/*/*') as $rec)
{
  foreach ($rec as $rec_obj)
  {
    if (!isset($rtn[$cnt]))
    {
      $rtn[$cnt] = array();
    }

    foreach ($rec_obj as $name=>$val)
    {
      $rtn[$cnt][(string)$name] = (string)$val;
    }
    ++$cnt;
  }
}


回答2:

By modifying the xpath as others suggested as well, I came to this conclusion. It works with one helper function to re-format each xpath result node and uses array_reduce to iterate over the result. It then returns the converted result (Demo):

$xml = new SimpleXMLElement($xmlstr);
$elements = array_reduce(
    $xml->xpath('//OSes/*/*'),
    function($v, $w) {
        $w = array_values((array) $w); // convert result to array
        foreach($w as &$d) $d = (array) $d; // convert inner elements to array
        return array_merge($v, $w); // merge with existing
    }, 
    array() // empty elements at start
);

Output:

Array
(
    [0] => Array
        (
            [id] => centos5-32
            [name] => CentOS 5 - 32 bit
            [version] => 5
            [architecture] => 32
            [os] => centos
        )

    [1] => Array
        (
            [id] => centos5-64
            [name] => CentOS 5 - 64 bit
            [version] => 5
            [architecture] => 64
            [os] => centos
        )

    [2] => Array
        (
            [id] => centos6-32
            [name] => CentOS 6 - 32 bit
            [version] => 6
            [architecture] => 32
            [os] => centos
        )

    [3] => Array
        (
            [id] => centos6-64
            [name] => CentOS 6 - 64 bit
            [version] => 6
            [architecture] => 64
            [os] => centos
        )

    [4] => Array
        (
            [id] => ubuntu10-32
            [name] => Ubuntu 10 - 32 bit
            [version] => 10
            [architecture] => 32
            [os] => ubuntu
        )

    [5] => Array
        (
            [id] => ubuntu10-64
            [name] => Ubuntu 10 - 64 bit
            [version] => 10
            [architecture] => 64
            [os] => ubuntu
        )

)

I also opted for converting the original xpath result into an array of two levels, each time within the current level a key already exists, move the current entry to a new entry (Demo):

try
{
    $xml = new SimpleXMLElement($xmlstr);
    $elements = array();
    $curr = NULL;
    foreach($xml->xpath('//id | //name | //version | //architecture | //os') as $record)
    {
        $key = $record->getName();
        $value = (string) $record;
        if (!$curr || array_key_exists($key, $curr)) {
            unset($curr);
            $curr = array();
            $elements[] = &$curr;
        }
        $curr[$key] = $value;
    }
    unset($curr);
}
catch(Exception $e)
{
    echo $e->getMessage();
}

Result is like this then:

Array
(
    [0] => Array
        (
            [id] => centos5-32
            [name] => CentOS 5 - 32 bit
            [version] => 5
            [architecture] => 32
            [os] => centos
        )

    [1] => Array
        (
            [id] => centos5-64
            [name] => CentOS 5 - 64 bit
            [version] => 5
            [architecture] => 64
            [os] => centos
        )

    [2] => Array
        (
            [id] => centos6-32
            [name] => CentOS 6 - 32 bit
            [version] => 6
            [architecture] => 32
            [os] => centos
        )

    [3] => Array
        (
            [id] => centos6-64
            [name] => CentOS 6 - 64 bit
            [version] => 6
            [architecture] => 64
            [os] => centos
        )

    [4] => Array
        (
            [id] => ubuntu10-32
            [name] => Ubuntu 10 - 32 bit
            [version] => 10
            [architecture] => 32
            [os] => ubuntu
        )

    [5] => Array
        (
            [id] => ubuntu10-64
            [name] => Ubuntu 10 - 64 bit
            [version] => 10
            [architecture] => 64
            [os] => ubuntu
        )

)


回答3:

Try this:

// flatten:
function arrayval1($any) {
  return array_values((array)$any);
}
function arrayval2($any) {
  return (array)$any;
}

// xml objects with xml objects:
$oses = $xml->xpath('//OSes/*/*');
// an array of xml objects:
$oses = array_map('arrayval1', $oses);
// merge to a flat array:
$oses = call_user_func_array('array_merge', $oses);
// xml objects -> arrays
$oses = array_map('arrayval2', $oses);
print_r($oses);

My result:

Array
(
    [0] => Array
        (
            [id] => centos5-32
            [name] => CentOS 5 - 32 bit
            [version] => 5
            [architecture] => 32
            [os] => centos
        )

    [1] => Array
        (
            [id] => centos5-64
            [name] => CentOS 5 - 64 bit
            [version] => 5
            [architecture] => 64
            [os] => centos
        )

    [2] => Array
        (
            [id] => centos6-32
            [name] => CentOS 6 - 32 bit
            [version] => 6
            [architecture] => 32
            [os] => centos
        )

    [3] => Array
        (
            [id] => centos6-64
            [name] => CentOS 6 - 64 bit
            [version] => 6
            [architecture] => 64
            [os] => centos
        )

    [4] => Array
        (
            [id] => ubuntu10-32
            [name] => Ubuntu 10 - 32 bit
            [version] => 10
            [architecture] => 32
            [os] => ubuntu
        )

    [5] => Array
        (
            [id] => ubuntu10-64
            [name] => Ubuntu 10 - 64 bit
            [version] => 10
            [architecture] => 64
            [os] => ubuntu
        )

)

If you're using PHP >= 5.3 (ofcourse you are, why whouldn't you) you can omit the nasty tmp function definitions and use cool anonymous functions for the mapping:

// an array of xml objects:
$oses = array_map(function($os) {
  return array_values((array)$os);
}, $oses);