PHP/XML - how to read multible sub's

2020-03-08 06:52发布

问题:

I need a to create an array with all the subject values in this XML file. The ISIN list seems to work fine (the first property value), but subject values does not work.

I would like to end up with a array looking something like this:

$Companys = array ( [0]  => array ( "isin" => "DK0010247014","company" => "AAB"),
                    [1]  => array ( "isin" => "DK0015250344","company" => "ALM BRAND"),
                    [2]  => array ( "isin" => "DK0015998017","company" => "BAVARIAN NORDI"),
                    [3]  => array ( "isin" => "DK0010259027","company" => "DFDS"),
                    [4]  => array ( "isin" => "DK0010234467","company" => "FLSMIDTH & CO"),
                );

This is an example of one of the files i am trying to parse:

<doc>
    <id>123456</id>
    <version>4.0</version>
    <consnr>7861</consnr>
    <doctype>10</doctype>
    <dest>99</dest>
    <created>2013-05-15 14:18:16</created>
    <source>Direkt-DK</source>
    <language>DA</language>
    <texttype>This is a type</texttype>
    <premium>False</premium>
    <header>This is a header</header>
    <text>
        <para format="Text">This is a paragraph</para>
        <para format="Text">This is a paragraph</para>
        <para format="Text">This is a paragraph</para>
        <para format="Text">This is a paragraph</para>
        <para format="Text"/>
        <para format="Text">This is a paragraph</para>
        <para format="Byline"/>
        <para format="Byline">contents og the by line</para>
        <para format="Byline"/>
        <para format="Byline"/>
    </text>
    <subjects>
        <subject value="AAB" weight="Main">
            <property value="DK0010247014" type2="isin" type1="identificator"/>
            <property value="CSE:AAB" type2="ticker" type1="identificator"/>
            <property type1="sector" type2="GICS" type3="1" value="25"/>
            <property type1="sector" type2="GICS" type3="2" value="2530"/>
            <property type1="sector" type2="GICS" type3="3" value="253010"/>
            <property type1="sector" type2="GICS" type3="4" value="25301030"/>
        </subject>
        <subject value="ALM BRAND" weight="Main">
            <property value="DK0015250344" type2="isin" type1="identificator"/>
            <property value="CSE:ALMB" type2="ticker" type1="identificator"/>
            <property type1="sector" type2="GICS" type3="1" value="40"/>
            <property type1="sector" type2="GICS" type3="2" value="4030"/>
            <property type1="sector" type2="GICS" type3="3" value="403010"/>
            <property type1="sector" type2="GICS" type3="4" value="40301040"/>
        </subject>
        <subject value="BAVARIAN NORDI" weight="Main">
            <property value="DK0015998017" type2="isin" type1="identificator"/>
            <property value="CSE:BAVA" type2="ticker" type1="identificator"/>
            <property type1="sector" type2="GICS" type3="1" value="35"/>
            <property type1="sector" type2="GICS" type3="2" value="3520"/>
            <property type1="sector" type2="GICS" type3="3" value="352010"/>
            <property type1="sector" type2="GICS" type3="4" value="35201010"/>
        </subject>
        <subject value="DFDS" weight="Main">
            <property value="DK0010259027" type2="isin" type1="identificator"/>
            <property value="CSE:DFDS" type2="ticker" type1="identificator"/>
            <property type1="sector" type2="GICS" type3="1" value="20"/>
            <property type1="sector" type2="GICS" type3="2" value="2030"/>
            <property type1="sector" type2="GICS" type3="3" value="203030"/>
            <property type1="sector" type2="GICS" type3="4" value="20303010"/>
        </subject>
        <subject value="FLSMIDTH & CO" weight="Main">
            <property value="DK0010234467" type2="isin" type1="identificator"/>
            <property value="CSE:FLS" type2="ticker" type1="identificator"/>
            <property type1="sector" type2="GICS" type3="1" value="20"/>
            <property type1="sector" type2="GICS" type3="2" value="2010"/>
            <property type1="sector" type2="GICS" type3="3" value="201030"/>
            <property type1="sector" type2="GICS" type3="4" value="20103010"/>
        </subject>
    </subjects>
</doc>

Script:

<?
    foreach($xmlObj->subjects->subject as $b ){
        $isin = $b->property;
        $company = $b->attributes();
        #$company = $b->attributes()->value;
        If($isin && $isinlist == 'null') $isinlist = $isin['value'];
        ElseIf ($isin && $isinlist) $isinlist .= ','.$isin['value'];
        If($company && $companylist == 'null') $companylist = $company['value'];
        ElseIf ($company && $companylist) $companylist .= ','.$company['value'];
        var_dump($company->value[0]);
    }
?>

回答1:

The main problem you've got is to find the child-element based on an attributes value. As there are multiple children with the same element name, you can not differ on the name alone.

In your concrete example the property child based on the attribute type2="isin".

This is either possible by making use of Xpath (this website already has a lot of Q&A material about that, for example SimpleXML: Selecting Elements Which Have A Certain Attribute Value) or by extending SimpleXMLElement with a function that just does it:

class MyElement extends SimpleXMLElement
{
    public function getChildByAttributeValue($name, $value) {
        foreach($this as $child)
        {
            if ($value === (string) $child[$name]) {
                return $child;
            }
        }
    }
}

You can then use the MyElement instead of the SimpleXMLElement:

$xml = simplexml_load_string($buffer, 'MyElement');
                                      ###########

and just map your values to an array:

$map = function(MyElement $subject) {
    return [
        (string) $subject['value'],
        (string) $subject->getChildByAttributeValue('type2', 'isin')['value'],
    ];
};

print_r(array_map($map, $xml->xpath('//subject')));

Given that $buffer is the XML you have provided in question (and the encoding error removed), this creates the following output:

Array
(
    [0] => Array
        (
            [0] => AAB
            [1] => DK0010247014
        )

    [1] => Array
        (
            [0] => ALM BRAND
            [1] => DK0015250344
        )

    [2] => Array
        (
            [0] => BAVARIAN NORDI
            [1] => DK0015998017
        )

    [3] => Array
        (
            [0] => DFDS
            [1] => DK0010259027
        )

    [4] => Array
        (
            [0] => FLSMIDTH & CO
            [1] => DK0010234467
        )

)

The full code example (Online Demo):

class MyElement extends SimpleXMLElement
{
    public function getChildByAttributeValue($name, $value) {
        foreach($this as $child)
        {
            if ($value === (string) $child[$name]) {
                return $child;
            }
        }
    }
}

$xml = simplexml_load_string($buffer, 'MyElement');

$map = function(MyElement $subject) {
    return [
        (string) $subject['value'],
        (string) $subject->getChildByAttributeValue('type2', 'isin')['value'],
    ];
};

print_r(array_map($map, $xml->xpath('//subject')));