SimpleXML children's attributes behaves differ

2019-06-24 09:19发布

问题:

The SimpleXML examples page, section "Example #5 Using attributes" states:

Access attributes of an element just as you would elements of an array.

And the example #1 in SimpleXMLElement::children() works using $element['attribute'] syntax to access children's attributes;

Adding a namespace to that code, will disable the access to attributes:

$xml = new SimpleXMLElement(
'<person xmlns:a="foo:bar">
  <a:child role="son">
    <a:child role="daughter"/>
  </a:child>
  <a:child role="daughter">
    <a:child role="son">
      <a:child role="son"/>
    </a:child>
  </a:child>
</person>');
foreach ($xml->children('a', true) as $second_gen) {
    echo ' The person begot a ' . $second_gen['role'];
    foreach ($second_gen->children('a', true) as $third_gen) {
        echo ' who begot a ' . $third_gen['role'] . ';';
        foreach ($third_gen->children('a', true) as $fourth_gen) {
            echo ' and that ' . $third_gen['role'] . ' begot a ' . $fourth_gen['role'];
        }
    }
}
// results
// The person begot a who begot a ; The person begot a who begot a ; and that begot a 
// expected
// The person begot a son who begot a daughter; The person begot a daughter who begot a son; and that son begot a son

There's is plenty of questions here pointing the same solution, to use the SimpleXMLElement::attributes() function instead of accessing as an array, but none answers explains why.

This different behavior when using namespaces is a bug? Is the documentation outdated? Should we always use SimpleXMLElement::attributes() and avoid the recommended array-like syntax?

Note: I'm using PHP 5.5.9-1ubuntu4.9.


Related questions

  • Retrieving attributes of namespaced children
  • how to access this child element - attribute in php simplexml
  • Get children attributes using simplexml

回答1:

The reason for this is not actually anything to do with SimpleXML, but to do with some surprising details of how XML namespaces work, according to the standard.

In your example, you have a namespace declared with the prefix a, so to declare that an attribute is in that namespace, you must prefix its name with a:, just as you do with elements:

<a:child a:role="daughter"/>

It seems to be a common assumption that an attribute without a namespace prefix is in the same namespace as the element it is on, but that is not the case. The example above is not equivalent to your example:

<a:child role="daughter"/>

Another case you might see is where there is in a default (unprefixed) namespace:

<person xmlns="http://example.com/foo.bar"><child role="daughter" /></person>

Here, the child element is in the http://example.com/foo.bar namespace, but the role attribute still isn't! As discussed in this related question, the relevant section of the XML Namespaces spec includes this statement:

The namespace name for an unprefixed attribute name always has no value.

That is, an attribute with no namespace prefix is never in any namespace, regardless of what the rest of the document looks like.

So, what effect does this have on SimpleXML?

SimpleXML works on the basis of altering the "current namespace" whenever you use the ->children() or ->attributes() methods, and tracking it from then on.

So when you write:

$children = $xml->children('a', true);

or:

$children = $xml->children('http://example.com/foo.bar');

the "current namespace" is foo:bar. Subsequent use of the ->childElement or ['attribute'] syntax will look in this namespace - you don't need to call children() again every time - but your unprefixed attributes won't be found there, because they have no namespace.

When you subsequently write:

$attributes = $children->attributes();

this is interpreted the same way as:

$attributes = $children->attributes(null);

So now, the "current namespace" is null. Now when you look for the attributes which have no namespace, you will find them.