When I want to traverse my XmlDocument using XPath, I came unto the problem that there were many ugly namespaces in the document, so I started using a NamespaceManager
along with the XPath.
The XML looks like this
<Workbook xmlns="urn:schemas-microsoft-com:office:spreadsheet"
xmlns:o="urn:schemas-microsoft-com:office:office"
xmlns:x="urn:schemas-microsoft-com:office:excel"
xmlns:ss="urn:schemas-microsoft-com:office:spreadsheet"
xmlns:html="http://www.w3.org/TR/REC-html40">
<Worksheet ss:Name="KA0100401">
<Table>
<Row>
<Cell>Data</Cell>
</Row>
<!-- more rows... -->
</Table>
</Worksheet>
<Worksheet ss:Name="KA0100402">
<!-- .... --->
</Worksheet>
</Workbook>
Now, from what I see from this document, "urn:schemas-microsoft-com:office:spreadsheet"
is the default namespace, because it sits on the root element.
So, naively, I configured my NamespaceManager
like this:
XmlDocument document = new XmlDocument();
document.Load(reader);
XmlNamespaceManager manager = new XmlNamespaceManager(document.NameTable);
manager.AddNamespace(String.Empty, "urn:schemas-microsoft-com:office:spreadsheet");
manager.AddNamespace("o", "urn:schemas-microsoft-com:office:office");
manager.AddNamespace("x", "urn:schemas-microsoft-com:office:excel");
manager.AddNamespace("ss", "urn:schemas-microsoft-com:office:spreadsheet");
manager.AddNamespace("html", "http://www.w3.org/TR/REC-html40");
But, when I try to access a node
foreach (XmlNode row in document.SelectNodes("/Workbook/Worksheet[1]/Table/Row", manager))
I never get any results. I was under the impression that by setting the first namespace with an empty prefix, I wouldn't need to set that when searching for nodes in that workspace.
But, as it is stated on the AddNamespace
method:
If an XPath expression does not include a prefix, it is assumed that the namespace Uniform Resource Identifier (URI) is the empty namespace.
Why is that? And, more important: How do I access nodes in the default namespace, if not using a prefix sets them into an empty namespace?
What good is setting the default namespace on the manager if I can't even access it when searching for nodes?
@JLRishe's answer is correct for accessing nodes in the default namespace (ie. always mapping a prefix to the default namespace in the XmlNamespaceManager
).
Reading the entire context of the link from your quote (MSDN XmlNamespaceManager.AddNamespace) it is stated that the default "empty" prefix is not used in XPath expressions.
prefix
Type: System.String
The prefix to associate with the namespace being added. Use String.Empty to add a default namespace.>
Note If the XmlNamespaceManager will be used for resolving namespaces in an XML Path Language (XPath) expression, a prefix must be specified. If an XPath expression does not include a prefix, it is assumed that the namespace Uniform Resource Identifier (URI) is the empty namespace. For more information about XPath expressions and the XmlNamespaceManager, refer to the XmlNode.SelectNodes and XPathExpression.SetContext methods.
From the XPath 1.0 spec:
A QName in the node test is expanded into an expanded-name using the namespace declarations from the expression context. This is the same way expansion is done for element type names in start and end-tags except that the default namespace declared with xmlns is not used: if the QName does not have a prefix, then the namespace URI is null (this is the same way attribute names are expanded). It is an error if the QName has a prefix for which there is no namespace declaration in the expression context.
So this is not a matter regarding NamespaceManager
but rather the way XPath is defined to work.
The point that you're missing is that the prefixes you use in your NamespaceManager
don't have to be anything like the ones in your XML document. You can use the xcel
prefix for urn:schemas-microsoft-com:office:excel
if you want, and the sp
prefix for urn:schemas-microsoft-com:office:spreadsheet
. In fact, you're already assigning a prefix for that URN in your namespace manager, so you can just use that:
foreach (XmlNode row in
document.SelectNodes("/ss:Workbook/ss:Worksheet[1]/ss:Table/ss:Row", manager))
Regarding this question:
What good is setting the default namespace on the manager if I can't even access it when searching for nodes?
The good is that XmlNamespaceManager
is used for more than just evaluating XPath. For example, it could be used to keep track of the namespaces in an XML document, in which there is a concept of default namespaces.
I can't answer your last question ("What good is ...") unless maybe it helps in non-XPath situations. But regarding "How do I access nodes in the default namespace, if not using a prefix sets them into an empty namespace?", the answer is that you have to use a prefix.
So in this case, since you declared the prefix ss
as being bound to the namespace whose URI is urn:schemas-microsoft-com:office:spreadsheet
, which is the same namespace as the default namespace, you can just use the ss
prefix in your XPath expression:
foreach (XmlNode row in document.SelectNodes("/ss:Workbook/ss:Worksheet[1]/ss:Table/ss:Row",
manager))
I find that if you delete the default namespace
xmlns="urn:schemas-microsoft-com:office:spreadsheet" <-delete it
or make the default namespace null
xmlns=""
When using XPath to search, it'll not need to add namespace to XPath before "the element without namespace".
So, is the default namespace declare really important,?
if not, I may delete the default namespace declaration, it makes using XPath to search much more easier because don't need to add namespace as usual.
I have tried another way is to add a default namespace with a name "default" which give by myself,
and I write a method that can automatically add "default" to the element without other namespace:
public static string XPathAddDeafultNameSpaceProccess(this string XPathProcess)
{
string[] XPSplit = XPathProcess.Split('/');
for (int i = 0; i < XPSplit.Length; i++)//if element no namespace, add default
{
if (!XPSplit[i].Contains(':') && !XPSplit[i].Contains('@'))
XPSplit[i] = "default:" + XPSplit[i];
}
for (int i = 0; i < XPSplit.Length; i++)
{
if (i != XPSplit.Length - 1)//if not the last, add"/"
XPSplit[i] += "/";
}
string output = "";
foreach (string s in XPSplit)//combine
output += s;
return output;
}
it can turn
aa/xx:cc/dd/hh:gg/bb"
to
"default:aa/xx:cc/default:dd/hh:gg/default:bb"