PHP/SimpleXML/XPath get attribute value by another

2019-03-06 19:56发布

I have this XML (from a pptx file):

<Relationships>
    <Relationship Id="rId3" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/image" Target="../media/image2.png"/>
    <Relationship Id="rId2" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/image" Target="../media/image1.wmf"/>
    <Relationship Id="rId1" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/slideLayout" Target="../slideLayouts/slideLayout1.xml"/>
</Relationships>

I want to pull the Target attribute from a Relationship element, and I know the Id value.

I could do it with SimpleXML if I iterate through the nodes (like this question)

$resxml = simplexml_load_file('zip://my.pptx#ppt/slides/_rels/slide1.xml.rels');
echo $resxml->Relationship[0]->attributes()->Target;

But I would like to get it using xpath using this sort of idea. Whatever I do in xpath returns an empty object when I search for something like 'rId3'. I thought it would be the below xpath statement, but it returns an empty object. I have tried about 50 combimations and found a lot of similar but not identical issues when searching:

$image = $resxml->xpath("/Relationships/Relationship[@Id='rId3']/@Target"); 
print_r($image);

I guess I will just end up iterating through all the nodes but it seems inefficient. My server appears to have XPath in the Dom available and SimpleXML enabled.

2条回答
家丑人穷心不美
2楼-- · 2019-03-06 20:34

I think you problem might be the namespace. PPTX Relationship files use the namespace "http://schemas.microsoft.com/package/2005/06/relationships". But SimpleXmls xpath does it's own magic, too. If the file contains the namespace (check the source) you have to register an own prefix for it.

$xml = <<<'XML'
<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
<Relationships
 xmlns="http://schemas.microsoft.com/package/2005/06/relationships">
 <Relationship Id="rId1"
 Type="http://schemas.microsoft.com/office/2006/relationships/image"
 Target="http://en.wikipedia.org/images/wiki-en.png"
 TargetMode="External" />
 <Relationship Id="rId2"
 Type="http://schemas.microsoft.com/office/2006/relationships/hyperlink"
 Target="http://www.wikipedia.org"
 TargetMode="External" />
</Relationships> 
XML;

$dom = new DOMDocument();
$dom->loadXml($xml);
$xpath = new DOMXpath($dom);
$xpath->registerNamespace('r', 'http://schemas.microsoft.com/package/2005/06/relationships');

var_dump(
  $xpath->evaluate("string(/r:Relationships/r:Relationship[@Id='rId2']/@Target)", NULL, FALSE)
);

Output:

string(24) "http://www.wikipedia.org"

Xpath does not know something like a default namespace. Without a prefix you look for elements without any namespace. Attributes don't have a namespace if not explicitly prefixed.

To make the confusion complete, do the PHP functions (SimpleXMLElement::xpath(), DOMXpath::query() and DOMXpath::evaluate()) automatically register the namespace definitions of the used context. The third argument allows to disable that behaviour.

Unlike the other two functions, DOMXpath::evaluate() can return scalars directly.

查看更多
Explosion°爆炸
3楼-- · 2019-03-06 20:38

Thank you. Your excellent answer was the key to me finding the solution. After reading your post, I found elsewhere in Stack exchange that SimpleXML deletes namespace attributes on the first node. I had consdered namespace as the issue but only looked at the simpleXML output when looking at the tree. You put me right when looking at the real source.

My solution just using simple XML looks like this:

$resxml->registerXPathNamespace('r', 'http://schemas.openxmlformats.org/package/2006/relationships');
$image = $resxml->xpath("/r:Relationships/r:Relationship[@Id='rId3']/@Target"); 
print_r($image);
查看更多
登录 后发表回答