Case insensitive XML parser in c#

2020-02-05 12:52发布

问题:

Everything you do with XML is case sensitive, I know that.

However, right now I find myself in a situation, where the software I'm writing would yield much fewer errors if I somehow made xml name/attribute recognition case insensitive. Case insensitive XPath would be a god sent.

Is there an easy way/library to do that in c#?

回答1:

An XMl document can have two different elements named respectively: MyName and myName -- that are intended to be different. Converting/treating them as the same name is an error that can have gross consequences.

In case the above is not the case, then here is a more precise solution, using XSLT to process the document into one that only has lowercase element names and lowercase attribute names:

<xsl:stylesheet version="1.0"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
 <xsl:output omit-xml-declaration="yes" indent="yes"/>
 <xsl:strip-space elements="*"/>

 <xsl:variable name="vUpper" select=
 "'ABCDEFGHIJKLMNOPQRSTUVWXYZ'"/>

 <xsl:variable name="vLower" select=
 "'abcdefghijklmnopqrstuvwxyz'"/>

 <xsl:template match="node()|@*">
     <xsl:copy>
       <xsl:apply-templates select="node()|@*"/>
     </xsl:copy>
 </xsl:template>

 <xsl:template match="*[name()=local-name()]" priority="2">
  <xsl:element name="{translate(name(), $vUpper, $vLower)}"
   namespace="{namespace-uri()}">
       <xsl:apply-templates select="node()|@*"/>
  </xsl:element>
 </xsl:template>

 <xsl:template match="*" priority="1">
  <xsl:element name=
   "{substring-before(name(), ':')}:{translate(local-name(), $vUpper, $vLower)}"
   namespace="{namespace-uri()}">
       <xsl:apply-templates select="node()|@*"/>
  </xsl:element>
 </xsl:template>

 <xsl:template match="@*[name()=local-name()]" priority="2">
  <xsl:attribute name="{translate(name(), $vUpper, $vLower)}"
   namespace="{namespace-uri()}">
       <xsl:value-of select="."/>
  </xsl:attribute>
 </xsl:template>

 <xsl:template match="@*" priority="1">
  <xsl:attribute name=
   "{substring-before(name(), ':')}:{translate(local-name(), $vUpper, $vLower)}"
   namespace="{namespace-uri()}">
     <xsl:value-of select="."/>
  </xsl:attribute>
 </xsl:template>
</xsl:stylesheet>

when this transformation is applied on any XML document, for example this one:

<authors xmlns:user="myNamespace">
  <?ttt This is a PI ?>
  <Author xmlns:user2="myNamespace2">
    <Name idd="VH">Victor Hugo</Name>
    <user2:Name idd="VH">Victor Hugo</user2:Name>
    <Nationality xmlns:user3="myNamespace3">French</Nationality>
  </Author>
  <!-- This is a very long comment the purpose is
       to test the default stylesheet for long comments-->
  <Author Period="classical">
    <Name>Sophocles</Name>
    <Nationality>Greek</Nationality>
  </Author>
  <author>
    <Name>Leo Tolstoy</Name>
    <Nationality>Russian</Nationality>
  </author>
  <Author>
    <Name>Alexander Pushkin</Name>
    <Nationality>Russian</Nationality>
  </Author>
  <Author Period="classical">
    <Name>Plato</Name>
    <Nationality>Greek</Nationality>
  </Author>
</authors>

the wanted, correct result (element and attribute names converted to lowercase) is produced:

<authors><?ttt This is a PI ?>
   <author>
      <name idd="VH">Victor Hugo</name>
      <user2:name xmlns:user2="myNamespace2" idd="VH">Victor Hugo</user2:name>
      <nationality>French</nationality>
   </author><!-- This is a very long comment the purpose is
       to test the default stylesheet for long comments-->
   <author period="classical">
      <name>Sophocles</name>
      <nationality>Greek</nationality>
   </author>
   <author>
      <name>Leo Tolstoy</name>
      <nationality>Russian</nationality>
   </author>
   <author>
      <name>Alexander Pushkin</name>
      <nationality>Russian</nationality>
   </author>
   <author period="classical">
      <name>Plato</name>
      <nationality>Greek</nationality>
   </author>
</authors>

Once the document is converted to your desired form, then you can perform any desired processing on the converted document.



回答2:

You can create case-insensitive methods (extensions for usability), e.g.:

public static class XDocumentExtensions
{
    public static IEnumerable<XElement> ElementsCaseInsensitive(this XContainer source,  
        XName name)
    {
        return source.Elements()
            .Where(e => e.Name.Namespace == name.Namespace 
                && e.Name.LocalName.Equals(name.LocalName, StringComparison.OrdinalIgnoreCase));
    }
}


回答3:

XML is text. Just ToLower it before loading to whatever parser you are using.

So long as you don't have to validate against a schema and don't mind the values being all lower case, this should work just fine.


The fact is that any XML parser will be case sensitive. If it were not, it wouldn't be an XML parser.



回答4:

I would start by converting all tags and attribute names to lowercase, leaving values untouched, by using SAX parsing, ie. with XmlTextReader.



回答5:

I use another solution. The reason people want this is because you don't want to duplicate the name of the property in the class file in an attribute as well. So what I do is add a custom attribute to all properties:

[AttributeUsage(AttributeTargets.Property)]
public class UsePropertyNameToLowerAsXmlElementAttribute: XmlElementAttribute
{
    public UsePropertyNameToLowerAsXmlElementAttribute([CallerMemberName] string propertyName = null)
    : base(propertyName?.ToLower())
    {
    }
}

This way the XML serializer can map lower case properties to CamelCased classes.

The properties on the classes still have a decorator that says that something is different, but you don't have the overhead of marking every property with a name:

public class Settings
{
    [UsePropertyNameToLowerAsXmlElement]
    public string VersionId { get; set; }

    [UsePropertyNameToLowerAsXmlElement]
    public int? ApplicationId { get; set; }
}