How to Move Child Elements to Attributes of Parent

2019-07-15 00:15发布

I currently have an XML file that is rather large in size (roughly 800MB). I've tried some attempts (here is one dealing with compression) to work with it in its current condition; however, they haven't been very successful as they take quite some time.

The XML file structure is similar to below (the generation pre-dates me):

<Name>Something</Name>
<Description>Some description.</Description>
<CollectionOfObjects>
    <Object>
        <Name>Name Of Object</Name>
        <Description>Description of object.</Description>
        <AltName>Alternate name</AltName>
        <ContainerName>Container</ContainerName>
        <Required>true</Required>
        <Length>1</Length>
            <Info>
                <Name>Name</Name>
                <File>Filename</File>
                <Size>20</Size>
                <SizeUnit>MB</SizeUnit>
            </Info>
    </Object>
</CollectionOfObjects>

There is quite a large chunk of data under each object, and a lot of these child nodes can be made into attributes on their parents:

<CollectionOfObjects Name="Something" Description="Some description.">
    <Object Name="Name Of Object" AltName="Alternate name" Container="Container" Required="true" Length="1" Description="Description of object.">
            <Info Name="Name" File="Filename" Size="20" SizeUnit="MB" />
    </Object>
</CollectionOfObjects>

Now, obviously not everything under each node will become an attribute; the above is just an example. There is so much data in this file it breaks Notepad and takes Visual Studio approximately 2 minutes to even open. Heaven helps you if you try to search the file because it takes an hour or longer.

You can see how this is problematic. I've done a test on the size difference (obviously not with this file) but with a demo file. I created a file and converted unnecessary child nodes into attributes and it reduced the demo files size by 53%. I have no doubt in my mind that performing the same work on this file will reduce its size by 30% or more (hoping for the more).

Now that you understand the why, let's get to the question; how do I move these child nodes to attributes. The file is generated via XmlSerializer and uses reflection to build the nodes based on the classes and properties available:

internal class DemoClass {
    [CategoryAttribute("Properties"), DescriptionAttribute("The name of this object.")]
    public string Name { get; set; }
}

internal bool Serialize(DemoClass demo, FileStream fs) {
    XmlSerializer serializer = new XmlSerializer(typeof(DemoClass));
    XmlWriterSettings settings = null;
    XmlWriter writer = null;
    bool result = true;
    try {
        settings = new XmlWriterSettings() {
            Indent = true,
            IndentChars = ("\t"),
            Encoding = Encoding.UTF8,
            NewLineOnAttributes = false,
            NewLineChars = Environment.NewLine,
            NewLineHandling = NewLineHandling.Replace
        };
        writer = XmlWriter.Create(fs, settings);
        serializer.Serialize(writer, demo);
    } catch { result = false; } finally { writer.Close(); }
    return result;
}

It is my understanding that I can just add the XmlAttribute tag to it and it will write all future versions of the file with that tag as attributes; however, I was told that in order to convert the data from the old way to the new way I may need some kind of "binder" which I am unsure of.

Any recommendations are going to be helpful here.

NOTE: I know the following can be done to reduce file size as well (dropped by 28%):

Indent = false,
Encoding = Encoding.UTF8,
NewLineOnAttributes = false,

Update: I am currently attempting to simply use the XmlAttribute tag on properties and I've encountered an error (which I expected) where the reflection failed on deserialization:

There was an error reflecting type DemoClass.

Update 2: Now working a new angle here; I've decided to copy all of the needed classes, update them with the XmlAttribute tag; then load the old file with the old classes and write the new file with the new classes. If this works then it'll be a great workaround. However, I'm sure there's a way to do this without this workaround.

Update 3: The method in Update 2 (above) did not work the way I expected and I ended up encountering this issue. Since this approach is also heavily involved, I ended up writing a custom conversion method that used the original serialization to load the XML, then using XDocument from the System.Xml.Linq namespace, I created a new XML document by hand. This ended up being a time consuming task, but less overall change in the long run. It serializes the file in the way expected (with some tweaking here and there of course). The next step was to update the old serialization now that the old files had been converted. I've made it approximately 80% of the way through this process, still hitting some road bumps here and there with reflection:

The type for XmlAttribute may not be specified for primitive types.

This occurs when attempting to de-serialize an enum value. The serializer seems to believe it is a string value instead.

1条回答
家丑人穷心不美
2楼-- · 2019-07-15 00:38

here's the code that worked for me.

static void Main()
{
    var element = XElement.Load(@"C:\Users\user\Downloads\CollectionOfObjects.xml");
    ElementsToAttributes(element);
    element.Save(@"C:\Users\user\Downloads\CollectionOfObjects-copy.xml");
}

static void ElementsToAttributes(XElement element)
{
    foreach(var el in element.Elements().ToList())
    {
        if(!el.HasAttributes && !el.HasElements)
        {
            var attribute = new XAttribute(el.Name, el.Value);
            element.Add(attribute);
            el.Remove();
        }
        else
            ElementsToAttributes(el);
    }
} 

The Xml in CollectionOfObjects.xml

<CollectionOfObjects>
  <Name>Something</Name>
  <Description>Some description.</Description>
  <Object>
    <Name>Name Of Object</Name>
    <Description>Description of object.</Description>
    <AltName>Alternate name</AltName>
    <ContainerName>Container</ContainerName>
    <Required>true</Required>
    <Length>1</Length>
    <Info>
      <Name>Name</Name>
      <File>Filename</File>
      <Size>20</Size>
      <SizeUnit>MB</SizeUnit>
    </Info>
  </Object>
</CollectionOfObjects>

The result Xml in CollectionOfObjects-copy.xml

<?xml version="1.0" encoding="utf-8"?>
<CollectionOfObjects Name="Something" Description="Some description.">
  <Object Name="Name Of Object" Description="Description of object." AltName="Alternate name" ContainerName="Container" Required="true" Length="1">
    <Info Name="Name" File="Filename" Size="20" SizeUnit="MB" />
  </Object>
</CollectionOfObjects>

查看更多
登录 后发表回答