Parse xml with namespaces with SimpleXMLparser php

2020-02-14 20:22发布

问题:

I am trying to parse a XML like this:

<?xml version="1.0" encoding="UTF-8"?>
<gml:FeatureCollection 
    xmlns:ogc="http://www.opengis.net/ogc" 
    xmlns:gml="http://www.opengis.net/gml"
    xmlns:xlink="http://www.w3.org/1999/xlink" 
    xmlns:wfs="http://www.opengis.net/wfs"
    xmlns:p="http://example.org">
    <gml:featureMember>
        <p:Point>
            <gml:pointProperty>
                <gml:Point srsName="epsg:4258">
                    <gml:pos>-3.84307585 43.46031547</gml:pos>
                </gml:Point>
                <gml:Point srsName="epsg:4258">
                    <gml:pos>-3.84299411 43.46018513</gml:pos>
                </gml:Point>
                <gml:Point srsName="epsg:4258">
                    <gml:pos>-3.84299935 43.45998723</gml:pos>
                </gml:Point>
                <!-- 
                    ... many more <gml:Point> nodes ...
                --> 
                <gml:Point srsName="epsg:4258">
                    <gml:pos>-3.84309913 43.46054546</gml:pos>
                </gml:Point>
                <gml:Point srsName="epsg:4258">
                    <gml:pos>-3.84307585 43.46031547</gml:pos>
                </gml:Point>
            </gml:pointProperty>
        </p:Point>
    </gml:featureMember>
</gml:FeatureCollection>

I want to get each of gml:pos rows to save to a DB but for the moment I am happy printing them in webpace (echo...)

$output = simplexml_load_string($output);
$xml = $output->getNamespaces(true); 
//print_r( $xml);
$xml_document = $output->children($xml["p"]);
foreach($xml_document->Point->children($xml["gml"]);
    echo $xml_point->Point[0];
echo $xml->FeatureCollection; 
}

In $output I have the complete xml, tons of coordinates in gml:point

But I am trying to get to the points using namespaces but I have to be doing something wrong because I can't print anything but Array word (even by using print_r...)

回答1:

You should not read the namespaces from the document. The namespace is a unique string defining the XML semantic the tag is part of. Your XML is a good example for that, because it has Point elements in two different namespaces.

p:Point is {http://example.org}:Point gml:Point is {http://www.opengis.net/gml}:Point

The namespace prefixes like p and gml are aliases to make a document smaller and more readable. They are only valid for the element and its children. They can be redefined at any point. More important they are only valid for the document.

So to read XML you define own prefixes for the namespaces and use them with Xpath or you use the namespace aware variants of the DOM methods like getAttributeNS(). Xpath is by a long way the more elegant solution. You can use the prefixes from the document or different ones.

$element = simplexml_load_string($content);
$element->registerXPathNamespace('gml', 'http://www.opengis.net/gml');
$element->registerXPathNamespace('p', 'http://example.org');

$result = [];
$positions = $element->xpath('//p:Point[1]//gml:pos');
foreach ($positions as $pos) {
  $result[] = (string)$pos;
}

var_dump($result);

Output: https://eval.in/159739

array(5) {
  [0]=>
  string(23) "-3.84307585 43.46031547"
  [1]=>
  string(23) "-3.84299411 43.46018513"
  [2]=>
  string(23) "-3.84299935 43.45998723"
  [3]=>
  string(23) "-3.84309913 43.46054546"
  [4]=>
  string(23) "-3.84307585 43.46031547"
}


回答2:

This would be easier using XPath, since you have nodes nested deeply in alternating namespaces, but since you are using SimpleXML I'll show you a solution using that framework.

This

$output->children($xml["p"]);

won't work because the root node has no children in the p namespace. You have to navigate the tree until you are in the right context. With XPath you could fetch them all with a descendant axis expression, which would be simpler. The code below works with SimpleXML:

$pointProperty = $output
                 ->children($xml["gml"])->featureMember
                 ->children($xml["p"])->Point
                 ->children($xml["gml"]);

Now you can loop on the children of pointProperty and you will have your Point nodes:

foreach($pointProperty->children($xml["gml"]) as $point)
    print_r($point);

From there on, the namespace doesn't change, so you can navigate normally and get the data in the pos elements. Here is an example:

echo '<table border="1">'."\n";
echo '  <tr><th>srsName</th><th>Longitude</th><th>Latitude</th></tr>'."\n";
foreach($pointProperty->children($xml["gml"]) as $point) {
    $coords = explode (' ', $point->pos);
    echo '  <tr><td>'.$point->attributes()['srsName'].'</td>';
    echo '<td>'.$coords[0].'</td>';
    echo '<td>'.$coords[1].'</td></tr>'."\n";
}
echo '</table>'."\n";

This will print a table containing your data. You can adapt this to fit your needs:

<table border="1">
  <tr><th>srsName</th><th>Longitude</th><th>Latitude</th></tr>
  <tr><td>epsg:4258</td><td>-3.84307585</td><td>43.46031547</td></tr>
  <tr><td>epsg:4258</td><td>-3.84299411</td><td>43.46018513</td></tr>
  ...
  <tr><td>epsg:4258</td><td>-3.84307585</td><td>43.46031547</td></tr>
</table>

Here's a working PHP Fiddle you can try out online.