xPath finds nothing but *

2019-01-09 16:22发布

This is starting to piss me off real bad. I have this XML code:

Updated with correct namespaces

<?xml version="1.0" encoding="utf-8"?>

<Infringement xsi:schemaLocation="http://www.movielabs.com/ACNS http://www.movielabs.com/ACNS/ACNS2v1.xsd" xmlns="http://www.movielabs.com/ACNS" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
  <Case>
    <ID>...</ID>
    <Status>Open</Status>
  </Case>
  <Complainant>
    <Entity>...</Entity>
    <Contact>...</Contact>
    <Address>...</Address>
    <Phone>...</Phone>
    <Email>...</Email>
  </Complainant>
  <Service_Provider>
    <Entity>...</Entity>
    <Address></Address>
    <Email>...</Email>
  </Service_Provider>
  <Source>
    <TimeStamp>...</TimeStamp>
    <IP_Address>...</IP_Address>
    <Port>...</Port>
    <DNS_Name></DNS_Name>
    <Type>...</Type>
    <UserName></UserName>
    <Number_Files>1</Number_Files>
    <Deja_Vu>No</Deja_Vu>
  </Source>
  <Content>
    <Item>
      <TimeStamp>...</TimeStamp>
      <Title>...</Title>
      <FileName>...</FileName>
      <FileSize>...</FileSize>
      <URL></URL>
    </Item>
  </Content>
</Infringement>

And this PHP code:

<?php 
    $data = urldecode($_POST["xml"]);
    $newXML = simplexml_load_string($data);

    var_dump($newXML->xpath("//ID"));
?>

I've dumped only $newXML and gotten tons of data, but the only xPath I've run that returned anything but an empty array was "*"

Isn't "//ID" supposed to find all ID nodes in the document? Why isn't it working?

Thanks

5条回答
手持菜刀,她持情操
2楼-- · 2019-01-09 17:01

Your XML document's root element seems to have default namespace with URI "http://www.movielabs.com/ACNS". This means that all elements in your document belong to that namespace. The problem is that all XPath expressions that do not have a namespace prefix are searching for elements that don't belong to any namespace. To search for elements (or attributes...) from a certain namespace you need to register the namespace URI to some prefix and then use this prefix in your XPath expression.

In case of PHP's simpleXML it's done something like this

$newXML = simplexml_load_string($data);
$newXML->registerXPathNamespace('prefix', 'http://www.movielabs.com/ACNS');
var_dump($newXML->xpath("//prefix:ID"));

prefixcan be practically any text, but the namespace URI must match exactly the one used in your XML document.

查看更多
对你真心纯属浪费
3楼-- · 2019-01-09 17:03

I've dumped only $newXML and gotten tons of data, but the only xPath I've run that returned anything but an empty array was "*"

So what was returned from var_dump($newXML->xpath("*"));? <Infringement>?

If the problem is namespaces, try this:

var_dump($newXML->xpath("//*[local-name() = 'ID']"));

This will match any element in the document whose name is 'ID', regardless of namespace.

My stuff works if i replace all "xmlns" with "ns"

Wait, what? Are you sure you showed us all the xmlns-related attributes in the document?

Update: The question was edited to show that the XML really does have a default namespace declaration. That explains the original problem: your XPath expression selects ID elements that are in no namespace, but the elements in your document are in the movielabs ACNS namespace, thanks to the default namespace declaration.

The declaration xmlns="http://www.movielabs.com/ACNS" on an element means "this element and all descendants that don't have a namespace prefix (like ID) are in the namespace represented by the namespace URI 'http://www.movielabs.com/ACNS'." (Unless an intervening descendant has a different default namespace declaration, which would shadow this one.)

So use my local-name() answer above to ignore namespaces, or use jasso's technique to specify the movielabs ACNS and use it as intended.

查看更多
Lonely孤独者°
4楼-- · 2019-01-09 17:04

use this for any namespace:

var_dump($newXML->xpath("//*:ID"));
查看更多
放荡不羁爱自由
5楼-- · 2019-01-09 17:07

You have an xml namespace defined in the document element (the xmlns="http://www.movielabs.com/ACNS" attribute). The namespace is the URL http://www.movielabs.com/ACNS. This has to by a globally unique string (an URN). Because of that URLs are used often. The chance that someone uses your domain for a namespace is very low and you can put some documentation at the URL.

The XML parser resolves the namespaces. The node gets 4 properties.

For <Infringement xmlns="http://www.movielabs.com/ACNS"/>:

$namespaceURI => http://www.movielabs.com/ACNS
$localName => Infringement
$prefix => 
$nodeName => Infringement

For <movie:Infringement xmlns:movie="http://www.movielabs.com/ACNS"/>:

$namespaceURI => http://www.movielabs.com/ACNS
$localName => Infringement
$prefix => movie
$nodeName => movie:Infringement

$namespaceURI and $localName are stable. The other two depend on prefix. The prefix is an alias for the namespace. The namespace uri is long and complex, it would make the XML a lot more difficult to read to write if used on each element/attribute. But you can interpret the element nodes like:

{http://www.movielabs.com/ACNS}:Infringement

So the namespace is the one thing that defines what the nodes mean, not the prefix/alias. Prefixes can be redefined on a sub element.

<foo xmlns="urn:foo"><bar xmlns="urn:bar"/></foo>

Xpath uses the same concept with an own resolver. You register your own prefixes for a namespace. So it doesn't matter how the prefixes are used in the XML, only the namespace uri has to match.

In DOM you do this on the DOMXPath instance:

$dom = new DOMDocument();
$dom->loadXml($xml);
$xpath = new DOMXpath($dom);
$xpath->registerNamespace('movie', 'http://www.movielabs.com/ACNS');

var_dump(
  $xpath->evaluate('string(/movie:Infringement/movie:Case/movie:ID)')
);

In SimpleXML, you can register the namespace on the SimpleXMLElement.

$element = simplexml_load_string($xml);
$element->registerXpathNamespace('movie', 'http://www.movielabs.com/ACNS');
var_dump(
  (string)$element->xpath('/movie:Infringement/movie:Case/movie:ID')[0]
);

HINT: The default namespace is only used for elements, attributes are in the "no/empty namespace" unless they have a prefix.

查看更多
甜甜的少女心
6楼-- · 2019-01-09 17:15

I'm not well-versed in PHP's XML API, but I suspect the problem lies in the namespaces. Depending on how that xpath method works, it may be searching for ID elements with an empty namespace. Your ID elements inherit their namespace from the root element.

查看更多
登录 后发表回答