Find all namespace declarations in an XML document

2019-03-11 16:31发布

问题:

As part of a Java 6 application, I want to find all namespace declarations in an XML document, including any duplicates.

Edit: Per Martin's request, here's the Java code I am using:

XPathFactory xPathFactory = XPathFactory.newInstance();
XPath xPath = xPathFactory.newXPath();
XPathExpression xPathExpression = xPathExpression = xPath.compile("//namespace::*"); 
NodeList nodeList = (NodeList) xPathExpression.evaluate(xmlDomDocument, XPathConstants.NODESET);

Suppose I have this XML document:

<?xml version="1.0" encoding="UTF-8"?>
<root xmlns:ele="element.com" xmlns:att="attribute.com" xmlns:txt="textnode.com">
    <ele:one>a</ele:one>
    <two att:c="d">e</two>
    <three>txt:f</three>
</root>

To find all namespace declarations, I applied this xPath statement to the XML document using xPath 1.0:

//namespace::*

It finds 4 namespace declarations, which is what I expect (and desire):

/root[1]/@xmlns:att - attribute.com
/root[1]/@xmlns:ele - element.com
/root[1]/@xmlns:txt - textnode.com
/root[1]/@xmlns:xml - http://www.w3.org/XML/1998/namespace

But if I change to using xPath 2.0, then I get 16 namespace declarations (each of the previous declarations 4 times), which is not what I expect (or desire):

/root[1]/@xmlns:xml - http://www.w3.org/XML/1998/namespace
/root[1]/@xmlns:att - attribute.com
/root[1]/@xmlns:ele - element.com
/root[1]/@xmlns:txt - textnode.com
/root[1]/@xmlns:xml - http://www.w3.org/XML/1998/namespace
/root[1]/@xmlns:att - attribute.com
/root[1]/@xmlns:ele - element.com
/root[1]/@xmlns:txt - textnode.com
/root[1]/@xmlns:xml - http://www.w3.org/XML/1998/namespace
/root[1]/@xmlns:att - attribute.com
/root[1]/@xmlns:ele - element.com
/root[1]/@xmlns:txt - textnode.com
/root[1]/@xmlns:xml - http://www.w3.org/XML/1998/namespace
/root[1]/@xmlns:att - attribute.com
/root[1]/@xmlns:ele - element.com
/root[1]/@xmlns:txt - textnode.com

This same difference is seen even when I use the non-abbreviated version of the xPath statement:

/descendant-or-self::node()/namespace::*

And it is seen across a variety of XML parsers (LIBXML, MSXML.NET, Saxon) as tested in oXygen. (Edit: As I mention later in the comments, this statement is not true. Though I thought I was testing a variety of XML parsers, I really wasn't.)

Question #1: Why the difference from xPath 1.0 to xPath 2.0?

Question #2: Is it possible/reasonable to get desired results using xPath 2.0?

Hint: Using the distinct-values() function in xPath 2.0 will not return the desired results, as I want all namespace declarations, even if the same namespace is declared twice. For example, consider this XML document:

<?xml version="1.0" encoding="UTF-8"?>
<root>
    <bar:one xmlns:bar="http://www.bar.com">alpha</bar:one>
    <bar:two xmlns:bar="http://www.bar.com">bravo</bar:two>
</root>

The desired result is:

/root[1]/@xmlns:xml - http://www.w3.org/XML/1998/namespace
/root[1]/bar:one[1]/@xmlns:bar - http://www.bar.com
/root[1]/bar:two[1]/@xmlns:bar - http://www.bar.com

回答1:

I think this will get all namespaces, without any duplicates:

for $i in 1 to count(//namespace::*) return 
if (empty(index-of((//namespace::*)[position() = (1 to ($i - 1))][name() = name((//namespace::*)[$i])], (//namespace::*)[$i]))) 
then (//namespace::*)[$i] 
else ()


回答2:

To find all namespace declarations, I applied this xPath statement to the XML document using xPath 1.0:

//namespace::* It finds 4 namespace declarations, which is what I expect (and desire):

/root[1]/@xmlns:att - attribute.com
/root[1]/@xmlns:ele - element.com 
/root[1]/@xmlns:txt - textnode.com 
/root[1]/@xmlns:xml - http://www.w3.org/XML/1998/namespace

You are using a non-compliant (buggy) XPath 1.0 implementation.

I get different and correct results with all XSLT 1.0 processors I have. This transformation (just evaluating the XPath expression and printing one line for each selected namespace node):

<xsl:stylesheet version="1.0"
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    xmlns:xs="http://www.w3.org/2001/XMLSchema">
    <xsl:output omit-xml-declaration="yes" indent="yes"/>

 <xsl:template match="/">
     <xsl:for-each select="//namespace::*">
       <xsl:value-of select="concat(name(), ': ', ., '&#xA;')"/>
     </xsl:for-each>
 </xsl:template>
</xsl:stylesheet>

when applied on the provided XML document:

<root xmlns:ele="element.com" xmlns:att="attribute.com" xmlns:txt="textnode.com">
    <ele:one>a</ele:one>
    <two att:c="d">e</two>
    <three>txt:f</three>
</root>

produces a correct result:

xml: http://www.w3.org/XML/1998/namespace
ele: element.com
att: attribute.com
txt: textnode.com
xml: http://www.w3.org/XML/1998/namespace
ele: element.com
att: attribute.com
txt: textnode.com
xml: http://www.w3.org/XML/1998/namespace
ele: element.com
att: attribute.com
txt: textnode.com
xml: http://www.w3.org/XML/1998/namespace
ele: element.com
att: attribute.com
txt: textnode.com

with all of these XSLT 1.0 and XSLT 2.0 processors:

MSXML3, MSXML4, MSXML6, .NET XslCompiledTransform, .NET XslTransform, Altova (XML SPY), Saxon 6.5.4, Saxon 9.1.07, XQSharp.

Here is a short C# program that confirms the number of nodes selected in .NET is 16:

namespace TestNamespaces
{
    using System;
    using System.IO;
    using System.Xml.XPath;

    class Test
    {
        static void Main(string[] args)
        {
            string xml =
@"<root xmlns:ele='element.com' xmlns:att='attribute.com' xmlns:txt='textnode.com'>
    <ele:one>a</ele:one>
    <two att:c='d'>e</two>
    <three>txt:f</three>
</root>";
            XPathDocument doc = new XPathDocument(new StringReader(xml));

            double count = 
              (double) doc.CreateNavigator().Evaluate("count(//namespace::*)");

            Console.WriteLine(count);
        }
    }
}

The result is:

16.

UPDATE:

This is an XPath 2.0 expression that finds just the "distinct" namespace nodes and produces a line of name - value pairs for each of them:

for $i in distinct-values(
             for $ns in //namespace::*
               return
                  index-of(
                           (for $x in //namespace::*
                             return
                                concat(name($x), ' ', string($x))

                            ),
                            concat(name($ns), ' ', string($ns))
                          )
                          [1]
                                                  )
  return
    for $x in (//namespace::*)[$i]
     return
        concat(name($x), ' :', string($x), '&#xA;')


回答3:

As the previous thread indicates, //namespace::* will return all the namespace nodes, of which there are 16, according to both the XPath 1.0 and XPath 2.0 implementations. It doesn't surprise me if you've found an implementation that doesn't implement the spec correctly.

Finding all the namespace declarations (as distinct from namespace nodes) is not in general possible with either XPath 1.0 or XPath 2.0, because the following two documents are considered equivalent at the data model level:

document A:

<a xmlns="one">
  <b/>
</a> 

document B:

<a xmlns="one">
  <b xmlns="one"/>
</a>

But if we define a "significant namespace declaration" to be a namespace that is present on a child element but not on its parent, then you could try this XPath 2.0 expression:

for $e in //* return
  for $n in $e/namespace::* return
     if (not(some $p in $n/../namespace::* satisfies ($p/name() eq $e/name() and string($p) eq string($n)))) then concat($e/name(), '->', $n/name(), '=', string($n)) else ()


回答4:

Here are my results using the XPath 1.0 implementations of .NET's XPathDocument (XSLT/XPath 1.0 data model), XmlDocument (DOM data model) and MSXML 6's DOM; the test code run against your sample XML document is

    Console.WriteLine("XPathDocument:");
    XPathDocument xpathDoc = new XPathDocument("../../XMLFile4.xml");
    foreach (XPathNavigator nav in xpathDoc.CreateNavigator().Select("//namespace::*"))
    {
        Console.WriteLine("Node type: {0}; name: {1}; value: {2}.", nav.NodeType, nav.Name, nav.Value);
    }
    Console.WriteLine();

    Console.WriteLine("DOM XmlDocument:");
    XmlDocument doc = new XmlDocument();
    doc.Load("../../XMLFile4.xml");
    foreach (XmlNode node in doc.SelectNodes("//namespace::*"))
    {
        Console.WriteLine("Node type: {0}; name: {1}; value: {2}.", node.NodeType, node.Name, node.Value);
    }
    Console.WriteLine();


    Console.WriteLine("MSXML 6 DOM:");
    dynamic msxmlDoc = Activator.CreateInstance(Type.GetTypeFromProgID("Msxml2.DOMDocument.6.0"));
    msxmlDoc.load("../../XMLFile4.xml");
    foreach (dynamic node in msxmlDoc.selectNodes("//namespace::*"))
    {
        Console.WriteLine("Node type: {0}; name: {1}; value: {2}.", node.nodeType, node.name, node.nodeValue);
    }

and its output is

XPathDocument:
Node type: Namespace; name: txt; value: textnode.com.
Node type: Namespace; name: att; value: attribute.com.
Node type: Namespace; name: ele; value: element.com.
Node type: Namespace; name: xml; value: http://www.w3.org/XML/1998/namespace.
Node type: Namespace; name: txt; value: textnode.com.
Node type: Namespace; name: att; value: attribute.com.
Node type: Namespace; name: ele; value: element.com.
Node type: Namespace; name: xml; value: http://www.w3.org/XML/1998/namespace.
Node type: Namespace; name: txt; value: textnode.com.
Node type: Namespace; name: att; value: attribute.com.
Node type: Namespace; name: ele; value: element.com.
Node type: Namespace; name: xml; value: http://www.w3.org/XML/1998/namespace.
Node type: Namespace; name: txt; value: textnode.com.
Node type: Namespace; name: att; value: attribute.com.
Node type: Namespace; name: ele; value: element.com.
Node type: Namespace; name: xml; value: http://www.w3.org/XML/1998/namespace.

DOM XmlDocument:
Node type: Attribute; name: xmlns:txt; value: textnode.com.
Node type: Attribute; name: xmlns:att; value: attribute.com.
Node type: Attribute; name: xmlns:ele; value: element.com.
Node type: Attribute; name: xmlns:xml; value: http://www.w3.org/XML/1998/namespa
ce.
Node type: Attribute; name: xmlns:txt; value: textnode.com.
Node type: Attribute; name: xmlns:att; value: attribute.com.
Node type: Attribute; name: xmlns:ele; value: element.com.
Node type: Attribute; name: xmlns:xml; value: http://www.w3.org/XML/1998/namespa
ce.
Node type: Attribute; name: xmlns:txt; value: textnode.com.
Node type: Attribute; name: xmlns:att; value: attribute.com.
Node type: Attribute; name: xmlns:ele; value: element.com.
Node type: Attribute; name: xmlns:xml; value: http://www.w3.org/XML/1998/namespa
ce.
Node type: Attribute; name: xmlns:txt; value: textnode.com.
Node type: Attribute; name: xmlns:att; value: attribute.com.
Node type: Attribute; name: xmlns:ele; value: element.com.
Node type: Attribute; name: xmlns:xml; value: http://www.w3.org/XML/1998/namespa
ce.

MSXML 6 DOM:
Node type: 2; name: xmlns:xml; value: http://www.w3.org/XML/1998/namespace.
Node type: 2; name: xmlns:ele; value: element.com.
Node type: 2; name: xmlns:att; value: attribute.com.
Node type: 2; name: xmlns:txt; value: textnode.com.
Node type: 2; name: xmlns:xml; value: http://www.w3.org/XML/1998/namespace.
Node type: 2; name: xmlns:ele; value: element.com.
Node type: 2; name: xmlns:att; value: attribute.com.
Node type: 2; name: xmlns:txt; value: textnode.com.
Node type: 2; name: xmlns:xml; value: http://www.w3.org/XML/1998/namespace.
Node type: 2; name: xmlns:ele; value: element.com.
Node type: 2; name: xmlns:att; value: attribute.com.
Node type: 2; name: xmlns:txt; value: textnode.com.
Node type: 2; name: xmlns:xml; value: http://www.w3.org/XML/1998/namespace.
Node type: 2; name: xmlns:ele; value: element.com.
Node type: 2; name: xmlns:att; value: attribute.com.
Node type: 2; name: xmlns:txt; value: textnode.com.

So it is certainly not an XPath 1.0 versus XPath 2.0 problem. I think the problem you see is a shortcoming of mapping the XPath data model with namespace nodes against the DOM model with attribute nodes. Someone more familiar with the Java XPath API needs to tell you whether the behaviour you see is correctly implementation dependent as the API specification is not precise enough for the case of mapping the XPath namespace axis to the DOM model or whether it is a bug.