I currently have an XML
file that is rather large in size (roughly 800MB
). I've tried some attempts (here is one dealing with compression) to work with it in its current condition; however, they haven't been very successful as they take quite some time.
The XML
file structure is similar to below (the generation pre-dates me):
<Name>Something</Name>
<Description>Some description.</Description>
<CollectionOfObjects>
<Object>
<Name>Name Of Object</Name>
<Description>Description of object.</Description>
<AltName>Alternate name</AltName>
<ContainerName>Container</ContainerName>
<Required>true</Required>
<Length>1</Length>
<Info>
<Name>Name</Name>
<File>Filename</File>
<Size>20</Size>
<SizeUnit>MB</SizeUnit>
</Info>
</Object>
</CollectionOfObjects>
There is quite a large chunk of data under each object, and a lot of these child nodes can be made into attributes on their parents:
<CollectionOfObjects Name="Something" Description="Some description.">
<Object Name="Name Of Object" AltName="Alternate name" Container="Container" Required="true" Length="1" Description="Description of object.">
<Info Name="Name" File="Filename" Size="20" SizeUnit="MB" />
</Object>
</CollectionOfObjects>
Now, obviously not everything under each node will become an attribute; the above is just an example. There is so much data in this file it breaks Notepad
and takes Visual Studio
approximately 2 minutes to even open. Heaven helps you if you try to search the file because it takes an hour or longer.
You can see how this is problematic. I've done a test on the size difference (obviously not with this file) but with a demo file. I created a file and converted unnecessary child nodes into attributes and it reduced the demo files size by 53%. I have no doubt in my mind that performing the same work on this file will reduce its size by 30% or more (hoping for the more).
Now that you understand the why, let's get to the question; how do I move these child nodes to attributes. The file is generated via XmlSerializer
and uses reflection to build the nodes based on the classes and properties available:
internal class DemoClass {
[CategoryAttribute("Properties"), DescriptionAttribute("The name of this object.")]
public string Name { get; set; }
}
internal bool Serialize(DemoClass demo, FileStream fs) {
XmlSerializer serializer = new XmlSerializer(typeof(DemoClass));
XmlWriterSettings settings = null;
XmlWriter writer = null;
bool result = true;
try {
settings = new XmlWriterSettings() {
Indent = true,
IndentChars = ("\t"),
Encoding = Encoding.UTF8,
NewLineOnAttributes = false,
NewLineChars = Environment.NewLine,
NewLineHandling = NewLineHandling.Replace
};
writer = XmlWriter.Create(fs, settings);
serializer.Serialize(writer, demo);
} catch { result = false; } finally { writer.Close(); }
return result;
}
It is my understanding that I can just add the XmlAttribute
tag to it and it will write all future versions of the file with that tag as attributes; however, I was told that in order to convert the data from the old way to the new way I may need some kind of "binder" which I am unsure of.
Any recommendations are going to be helpful here.
NOTE: I know the following can be done to reduce file size as well (dropped by 28%):
Indent = false,
Encoding = Encoding.UTF8,
NewLineOnAttributes = false,
Update: I am currently attempting to simply use the XmlAttribute
tag on properties and I've encountered an error (which I expected) where the reflection failed on deserialization:
There was an error reflecting type
DemoClass
.
Update 2: Now working a new angle here; I've decided to copy all of the needed classes, update them with the XmlAttribute
tag; then load the old file with the old classes and write the new file with the new classes. If this works then it'll be a great workaround. However, I'm sure there's a way to do this without this workaround.
Update 3: The method in Update 2 (above) did not work the way I expected and I ended up encountering this issue. Since this approach is also heavily involved, I ended up writing a custom conversion method that used the original serialization to load the XML
, then using XDocument
from the System.Xml.Linq
namespace, I created a new XML
document by hand. This ended up being a time consuming task, but less overall change in the long run. It serializes the file in the way expected (with some tweaking here and there of course). The next step was to update the old serialization now that the old files had been converted. I've made it approximately 80% of the way through this process, still hitting some road bumps here and there with reflection:
The type for XmlAttribute may not be specified for primitive types.
This occurs when attempting to de-serialize an enum
value. The serializer seems to believe it is a string
value instead.
here's the code that worked for me.
The Xml in CollectionOfObjects.xml
The result Xml in CollectionOfObjects-copy.xml