Remove namespace from XML using PHP

2019-01-14 16:42发布

问题:

I have an XML document that looks like this:

<Data 
  xmlns="http://www.domain.com/schema/data" 
  xmlns:dmd="http://www.domain.com/schema/data-metadata"
>
  <Something>...</Something>
</Data>

I am parsing the information using SimpleXML in PHP. I am dealing with arrays and I seem to be having a problem with the namespace.

My question is: How do I remove those namespaces? I read the data from an XML file.

Thank you!

回答1:

If you're using XPath then it's a limitation with XPath and not PHP look at this explanation on xpath and default namespaces for more info.

More specifically its the xmlns="" attribute in the root node which is causing the problem. This means that you'll need to register the namespace then use a QName thereafter to refer to elements.

$feed = simplexml_load_file('http://www.sitepoint.com/recent.rdf');
$feed->registerXPathNamespace("a", "http://www.domain.com/schema/data");
$result = $feed->xpath("a:Data/a:Something/...");

Important: The URI used in the registerXPathNamespace call must be identical to the one that is used in the actual XML file.



回答2:

I found the answer above to be helpful, but it didn't quite work for me. This ended up working better:

// Gets rid of all namespace definitions 
$xml_string = preg_replace('/xmlns[^=]*="[^"]*"/i', '', $xml_string);

// Gets rid of all namespace references
$xml_string = preg_replace('/[a-zA-Z]+:([a-zA-Z]+[=>])/', '$1', $xml_string);


回答3:

The following PHP code automatically detects the default namespace specified in the XML file under the alias "default". No all xpath queries have to be updated to include the prefix default:

So if you want to read XML files rather they contain an default NS definition or they don't and you want to query all Something elements, you could use the following code:

$xml = simplexml_load_file($name);
$namespaces = $xml->getDocNamespaces();
if (isset($namespaces[''])) {
    $defaultNamespaceUrl = $namespaces[''];
    $xml->registerXPathNamespace('default', $defaultNamespaceUrl);
    $nsprefix = 'default:';
} else {
    $nsprefix = '';
}

$somethings = $xml->xpath('//'.$nsprefix.'Something');

echo count($somethings).' times found';


回答4:

To remove the namespace completely, you'll need to use Regular Expressions (RegEx). For example:

$feed = file_get_contents("http://www.sitepoint.com/recent.rdf");
$feed = preg_replace("/<.*(xmlns *= *[\"'].[^\"']*[\"']).[^>]*>/i", "", $feed); // This removes ALL default namespaces.
$xml_feed = simplexml_load_string($feed);

Then you've stripped any xml namespaces before you load the XML (be careful with the regex through, because if you have any fields with something like:

<![CDATA[ <Transfer xmlns="http://redeux.example.com">cool.</Transfer> ]]>

Then it will strip the xmlns from inside the CDATA which may lead to unexpected results.