How to detect and remove unnecessary xmlns:

2019-01-26 00:51发布

问题:

Say I have a source document like this:

<element>
  <subelement xmlns:someprefix="mynamespace"/>
</element>

The xmlns:someprefix is obviously not needed here and doesn't do anything since that prefix is not being used in that element (or in my case, anywhere in the document).

In PHP, after I've loaded this into a DOM tree with DOMDocument->loadXML(), I'd like to be able to detect that such a namespace declaration exists, and remove it.

I know that I can read it with hasAttribute() and even remove it with removeAttributeNS() (strangely) but only if I know its prefix. It doesn't appear in DOMNode->attributes at all, as the thing I'm trying to find is not considered an attribute. I cannot see any way of detecting that it is there without knowing the prefix, other than serialising it back to an XML string and running a regex or something.

How can I do it? Any way to query which namespaces (ie xmlns:something) have been declared in an element?

回答1:

How to detect:

<?php
$d = new DOMDocument();
$d->loadXML('
<element>
  <subelement xmlns:someprefix="http://mynamespace/asd">
  </subelement>
</element>');
$sxe = simplexml_import_dom($d);
$namespaces = $sxe->getDocNamespaces(true);
$x = new DOMXpath($d);
foreach($namespaces as $prefix => $url){
        $count = $x->evaluate("count(//*[namespace-uri()='".$url."' or @*[namespace-uri()='".$url."']])");
        echo $prefix.' ( '.$url.' ): used '.$count.' times'.PHP_EOL;
}

How to remove: pfff, about your only option that I know of is to use xml_parse_into_struct() (as this is not libxml2 reliant afaik), and looping through the resulting array with XML Writer functions, skipping namespace declarations which are not used. Not a fun passtime, so I'll leave the implementation up to you. Another option could be XSL according to this question, but I doubt it is of much use. My best effort seems to succeed, but moves 'top-level'/rootnode namespaces to children, resulting in even more clutter.

edit: this seems to work:

Given XML (added some namespace clutter):

<element xmlns:yetanotherprefix="http://mynamespace/yet">
  <subelement
        xmlns:someprefix="http://mynamespace/foo"
        xmlns:otherprefix="http://mynamespace/bar"
        foo="bar"
        yetanotherprefix:bax="foz">
        <otherprefix:bar>
                <yetanotherprefix:element/>
                <otherprefix:element/>
        </otherprefix:bar>
        <otherprefix:bar>
                <yetanotherprefix:element/>
                <otherprefix:element/>
        </otherprefix:bar>
        <yetanotherprefix:baz/>
  </subelement>

With xsl (namespaces & not() clause based on previous $used array, so you'll still need that afaik.

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet 
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0"
    xmlns:yetanotherprefix="http://mynamespace/yet"
    xmlns:otherprefix="http://mynamespace/bar"> 
    <xsl:template match="/">
        <xsl:apply-templates select="/*"/>
    </xsl:template>
    <xsl:template match="*">
        <xsl:element name="{name(.)}">
                <xsl:apply-templates select="./@*"/>
                <xsl:copy-of select="namespace::*[not(name()='someprefix')]"/>
                <xsl:apply-templates select="./node()"/>
        </xsl:element>
    </xsl:template>

    <xsl:template match="@*">
        <xsl:copy/>
    </xsl:template>
</xsl:stylesheet>

Results in:

<?xml version="1.0"?>
<element xmlns:yetanotherprefix="http://mynamespace/yet">
  <subelement xmlns:otherprefix="http://mynamespace/bar" foo="bar" yetanotherprefix:bax="foz">
        <otherprefix:bar>
                <yetanotherprefix:element/>
                <otherprefix:element/>
        </otherprefix:bar>
        <otherprefix:bar>
                <yetanotherprefix:element/>
                <otherprefix:element/>
        </otherprefix:bar>
        <yetanotherprefix:baz/>
  </subelement>
</element>