Parse XDocument without having to keep specifying

2020-07-04 13:08发布

问题:

I have some XML data (similar to the sample below) and I want to read the values in code.

Why am I forced to specify the default namespace to access each element? I would have expected the default namespace to be used for all elements.

Is there a more logical way to achieve my goal?

Sample XML:

<?xml version="1.0" encoding="UTF-8"?>
<ReceiptsBatch xmlns="http://www.secretsonline.gov.uk/secrets">
    <MessageHeader>
        <MessageID>00000173</MessageID>
        <Timestamp>2009-10-28T16:50:01</Timestamp>
        <MessageCheck>BX4f+RmNCVCsT5g</MessageCheck>
    </MessageHeader>
    <Receipts>
        <Receipt>
            <Status>OK</Status>
        </Receipt>
    </Receipts>
</ReceiptsBatch>

Code to read xml elements I'm after:

XDocument xDoc = XDocument.Load( FileInPath );

XNamespace ns = "http://www.secretsonline.gov.uk/secrets";

XElement MessageCheck = xDoc.Element(ns+ "MessageHeader").Element(ns+"MessageCheck");
XElement MessageBody = xDoc.Element("Receipts");

回答1:

The theory is that the meaning of the document is not affected by the user's choice of namespace prefixes. So long as the data is in the namespace http://www.secretsonline.gov.uk/secrets, it doesn't matter whether the author chooses to use the prefix "s", "secrets", "_x.cafe.babe", or the "null" prefix (that is, making it the default namespace). Your application shouldn't care: it's only the URI that matters. That's why your application has to specify the URI.



回答2:

As suggested by this answer, you can do this by removing all namespaces from the in-memory copy of the document. I suppose this should only be done if you know you won't have name collisions in the resulting document.

/// <summary>
/// Makes parsing easier by removing the need to specify namespaces for every element.
/// </summary>
private static void RemoveNamespaces(XDocument document)
{
    var elements = document.Descendants();
    elements.Attributes().Where(a => a.IsNamespaceDeclaration).Remove();
    foreach (var element in elements)
    {
        element.Name = element.Name.LocalName;

        var strippedAttributes =
            from originalAttribute in element.Attributes().ToArray()
            select (object)new XAttribute(originalAttribute.Name.LocalName, originalAttribute.Value);

        //Note that this also strips the attributes' line number information
        element.ReplaceAttributes(strippedAttributes.ToArray());
    }
}


回答3:

You can use XmlTextReader.Namespaces property to disable namespaces while reading XML file.

string filePath;
XmlTextReader xReader = new XmlTextReader(filePath);
xReader.Namespaces = false;
XDocument xDoc = XDocument.Load(xReader);


回答4:

This is how the Linq-To-Xml works. You can't find any element, if it is not in default namespace, and the same is true about its descendants. The fastest way to get rid from namespace is to remove link to the namespace from your initial XML.



回答5:

Note that the element Receipts is also in namespace http://www.secretsonline.gov.uk/secrets, so the XNamespace would also be required for the access to the element:

XElement MessageBody = xDoc.Element(ns + "Receipts");

As an alternative to using namespaces note that you can use "namespace agnostic" xpath using local-name() and namespace-uri(), e.g.

/*[local-name()='SomeElement' and namespace-uri()='somexmlns']

If you omit the namespace-uri predicate:

/*[local-name()='SomeElement']

Would match ns1:SomeElement and ns2:SomeElement etc. IMO I would always prefer XNamespace where possible, and the use-cases for namespace-agnostic xpath are quite limited, e.g. for parsing of specific elements in documents with unknown schemas (e.g. within a service bus), or best-effort parsing of documents where the namespace can change (e.g. future proofing, where the xmlns changes to match a new version of the document schema)