Why is XmlNamespaceManager necessary?

2019-01-30 07:48发布

问题:

I've come up kinda dry as to why -- at least in the .Net Framework -- it is necessary to use an XmlNamespaceManager in order to handle namespaces (or the rather clunky and verbose [local-name()=... XPath predicate/function/whatever) when performing XPath queries. I do understand why namespaces are necessary or at least beneficial, but why is it so complex?

In order to query a simple XML Document (no namespaces)...

<?xml version="1.0" encoding="ISO-8859-1"?>
<rootNode>
   <nodeName>Some Text Here</nodeName>
</rootNode>

...one can use something like doc.SelectSingleNode("//nodeName") (which would match <nodeName>Some Text Here</nodeName>)

Mystery #1: My first annoyance -- If I understand correctly -- is that merely adding a namespace reference to the parent/root tag (whether used as part of a child node tag or not) like so:

<?xml version="1.0" encoding="ISO-8859-1"?>
<rootNode xmlns="http://someplace.org">
   <nodeName>Some Text Here</nodeName>
</rootNode>

...requires several extra lines of code to get the same result:

Dim nsmgr As New XmlNamespaceManager(doc.NameTable)
nsmgr.AddNamespace("ab", "http://s+omeplace.org")
Dim desiredNode As XmlNode = doc.SelectSingleNode("//ab:nodeName", nsmgr)

...essentially dreaming up a non-existent prefix ("ab") to find a node that doesn't even use a prefix. How does this make sense? What is wrong (conceptually) with doc.SelectSingleNode("//nodeName")?

Mystery #2: So, say you've got an XML document that uses prefixes:

<?xml version="1.0" encoding="ISO-8859-1"?>
<rootNode xmlns:cde="http://someplace.org" xmlns:feg="http://otherplace.net">
   <cde:nodeName>Some Text Here</cde:nodeName>
   <feg:nodeName>Some Other Value</feg:nodeName>
   <feg:otherName>Yet Another Value</feg:otherName>
</rootNode>

... If I understand correctly, you would have to add both namespaces to the XmlNamespaceManager, in order to make a query for a single node...

Dim nsmgr As New XmlNamespaceManager(doc.NameTable)
nsmgr.AddNamespace("cde", "http://someplace.org")
nsmgr.AddNamespace("feg", "http://otherplace.net")
Dim desiredNode As XmlNode = doc.SelectSingleNode("//feg:nodeName", nsmgr)

... Why, in this case, do I need (conceptually) a namespace manager?

**REDACTED into comments below**

Edit Added: My revised and refined question is based upon the apparent redundancy of the XmlNamespaceManager in what I believe to be the majority of cases and the use of the namespace manager to specify a mapping of prefix to URI:

When the direct mapping of the namespace prefix ("cde") to the namespace URI ("http://someplace.org") is explicitly stated in the source document:

...<rootNode xmlns:cde="http://someplace.org"...

what is the conceptual need for a programmer to recreate that mapping before making a query?

回答1:

The basic point (as pointed out by Kev, above), is that the namespace URI is the important part of the namespace, rather than the namespace prefix, the prefix is an "arbitrary convenience"

As for why you need a namespace manager, rather than there being some magic that works it out using the document, I can think of two reasons.

Reason 1

If it were permitted to only add namespace declarations to the documentElement, as in your examples, it would indeed be trivial for selectSingleNode to just use whatever is defined.

However, you can define namespace prefixes on any element in a document, and namespace prefixes are not uniquely bound to any given namespace in a document. Consider the following example

<w xmlns:a="mynamespace">
  <a:x>
    <y xmlns:a="myOthernamespace">
      <z xmlns="mynamespace">
      <b:z xmlns:b="mynamespace">
      <z xmlns="myOthernamespace">
      <b:z xmlns:b="myOthernamespace">
    </y>
  </a:x>
</w>

In this example, what would you want //z, //a:z and //b:z to return? How, without some kind of external namespace manager, would you express that?

Reason 2

It allows you to reuse the same XPath expression for any equivalent document, without needing to know anything about the namespace prefixes in use.

myXPathExpression = "//z:y"
doc1.selectSingleNode(myXPathExpression);
doc2.selectSingleNode(myXPathExpression);

doc1:

<x>
  <z:y xmlns:z="mynamespace" />
</x>

doc2:

<x xmlns"mynamespace">
  <y>
</x>

In order to achieve this latter goal without a namespace manager, you would have to inspect each document, building a custom XPath expression for each one.



回答2:

The reason is simple. There is no required connection between the prefixes you use in your XPath query and the declared prefixes in the xml document. To give an example the following xmls are semantically equivalent:

<aaa:root xmlns:aaa="http://someplace.org">
 <aaa:element>text</aaa:element>
</aaa:root>

vs

  <bbb:root xmlns:bbb="http://someplace.org">
     <bbb:element>text</bbb:element>
  </bbb:root>

The "ccc:root/ccc:element" query will match both instances provided there is a mapping in the namespace manager for that.

nsmgr.AddNamespace("ccc", "http://someplace.org")

The .NET implementation does not care about the literal prefixes used in the xml only that there is a prefix defined for the query literal and that the namespace value matches the actual value of the doc. This is required to have constant query expressions even if the prefixes vary between consumed documents and it's the correct implementation for the general case.



回答3:

As far as I can tell, there is no good reason that you should need to manually define an XmlNamespaceManager to get at abc-prefixed nodes if you have a document like this:

<itemContainer xmlns:abc="http://abc.com" xmlns:def="http://def.com">
    <abc:nodeA>...</abc:nodeA>
    <def:nodeB>...</def:nodeB>
    <abc:nodeC>...</abc:nodeC>
</itemContainer>

Microsoft simply couldn't be bothered to write something to detect that xmlns:abc had already been specified in a parent node. I could be wrong, and if so, I'd welcome comments on this answer so I can update it.

However, this blog post seems to confirm my suspicion. It basically says that you need to manually define an XmlNamespaceManager and manually iterate through the xmlns: attributes, adding each one to the namespace manager. Dunno why Microsoft couldn't do this automatically.

Here's a method I created based on that blog post to automatically generate an XmlNamespaceManager based on the xmlns: attributes of a source XmlDocument:

/// <summary>
/// Creates an XmlNamespaceManager based on a source XmlDocument's name table, and prepopulates its namespaces with any 'xmlns:' attributes of the root node.
/// </summary>
/// <param name="sourceDocument">The source XML document to create the XmlNamespaceManager for.</param>
/// <returns>The created XmlNamespaceManager.</returns>
private XmlNamespaceManager createNsMgrForDocument(XmlDocument sourceDocument)
{
    XmlNamespaceManager nsMgr = new XmlNamespaceManager(sourceDocument.NameTable);

    foreach (XmlAttribute attr in sourceDocument.SelectSingleNode("/*").Attributes)
    {
        if (attr.Prefix == "xmlns")
        {
            nsMgr.AddNamespace(attr.LocalName, attr.Value);
        }
    }

    return nsMgr;
}

And I use it like so:

XPathNavigator xNav = xmlDoc.CreateNavigator();
XPathNodeIterator xIter = xNav.Select("//abc:NodeC", createNsMgrForDocument(xmlDoc));


回答4:

I answer to point 1:

Setting a default namespace for an XML document still means that the nodes, even without a namespace prefix, i.e.:

<rootNode xmlns="http://someplace.org">
   <nodeName>Some Text Here</nodeName>
</rootNode>

are no longer in the "empty" namespace. You still need some way to reference these nodes using XPath, so you create a prefix to reference them, even if it is "made up".

To answer point 2:

<rootNode xmlns:cde="http://someplace.org" xmlns:feg="http://otherplace.net">
   <cde:nodeName>Some Text Here</cde:nodeName>
   <feg:nodeName>Some Other Value</feg:nodeName>
   <feg:otherName>Yet Another Value</feg:otherName>
</rootNode>

Internally in the instance document, the nodes that reside in a namespace are stored with their node name and their long namespace name, it's called (in W3C parlance) an expanded name.

For example <cde:nodeName> is essentially stored as <http://someplace.org:nodeName>. A namespace prefix is an arbitrary convenience for humans so that when we type out XML or have to read it we don't have to do this:

<rootNode>
   <http://someplace.org:nodeName>Some Text Here</http://someplace.org:nodeName>
   <http://otherplace.net:nodeName>Some Other Value</http://otherplace.net:nodeName>
   <http://otherplace.net:otherName>Yet Another Value</http://otherplace.net:otherName>
</rootNode>

When an XML document is searched, it's not searched by the friendly prefix, they search is done by namespace URI so you have to tell XPath about your namespaces via a namespace table passed in using XmlNamespaceManager.



回答5:

You need to register the URI/prefix pairs to the XmlNamespaceManager instance to let SelectSingleNode() know which particular "nodeName" node you're referring to - the one from "http://someplace.org" or the one from "http://otherplace.net".

Please note that the concrete prefix name doesn't matter when you're doing the XPath query. I believe this works too:

Dim nsmgr As New XmlNamespaceManager(doc.NameTable)
nsmgr.AddNamespace("any", "http://someplace.org")
nsmgr.AddNamespace("thing", "http://otherplace.net")
Dim desiredNode As XmlNode = doc.SelectSingleNode("//thing:nodeName", nsmgr)

SelectSingleNode() just needs a connection between the prefix from your XPath expression and the namespace URI.



回答6:

This thread has helped me understand the issue of namespaces much more clearly. Thanks. When I saw Jez's code, I tried it because it looked like a better solution than I had programmed. I discovered some shortcomings with it, though. As written, it looks only in the root node (but namespaces can be listed anywhere.), and it doesn't handle default namespaces. I tried to address these issues by modifying his code, but to no avail.

Here is my version of that function. It uses regular expressions to find the namespace mappings throughout the file; works with default namespaces, giving them the arbitrary prefix 'ns'; and handles multiple occurrences of the same namespace.

private XmlNamespaceManager CreateNamespaceManagerForDocument(XmlDocument document)
{
    var nsMgr = new XmlNamespaceManager(document.NameTable);

    // Find and remember each xmlns attribute, assigning the 'ns' prefix to default namespaces.
    var nameSpaces = new Dictionary<string, string>();
    foreach (Match match in new Regex(@"xmlns:?(.*?)=([\x22\x27])(.+?)\2").Matches(document.OuterXml))
        nameSpaces[match.Groups[1].Value + ":" + match.Groups[3].Value] = match.Groups[1].Value == "" ? "ns" : match.Groups[1].Value;

    // Go through the dictionary, and number non-unique prefixes before adding them to the namespace manager.
    var prefixCounts = new Dictionary<string, int>();
    foreach (var namespaceItem in nameSpaces)
    {
        var prefix = namespaceItem.Value;
        var namespaceURI = namespaceItem.Key.Split(':')[1];
        if (prefixCounts.ContainsKey(prefix)) 
            prefixCounts[prefix]++; 
        else 
            prefixCounts[prefix] = 0;
        nsMgr.AddNamespace(prefix + prefixCounts[prefix].ToString("#;;"), namespaceURI);
    }
    return nsMgr;
}