I have a class that serializes a set of objects (using XML serialization) that I want to unit test.
My problem is it feels like I will be testing the .NET implementation of XML serialization, instead of anything useful. I also have a slight chicken and egg scenario where in order to test the Reader, I will need a file produced by the Writer to do so.
I think the questions (there's 3 but they all relate) I'm ultimately looking for feedback on are:
- Is it possible to test the Writer, without using the Reader?
- What is the best strategy for testing the reader (XML file? Mocking with record/playback)? Is it the case that all you will really be doing is testing property values of the objects that have been deserialized?
- What is the best strategy for testing the writer!
Background info on Xml serialization
I'm not using a schema, so all XML elements and attributes match the objects' properties. As there is no schema, tags/attributes which do not match those found in properties of each object, are simply ignored by the XmlSerializer (so the property's value is null or default). Here is an example
<MyObject Height="300">
<Name>Bob</Name>
<Age>20</Age>
<MyObject>
would map to
public class MyObject
{
public string Name { get;set; }
public int Age { get;set; }
[XmlAttribute]
public int Height { get;set; }
}
and visa versa. If the object changed to the below the XML would still deserialize succesfully, but FirstName would be blank.
public class MyObject
{
public string FirstName { get;set; }
public int Age { get;set; }
[XmlAttribute]
public int Height { get;set; }
}
An invalid XML file would deserialize correctly, therefore the unit test would pass unless you ran assertions on the values of the MyObject.
There are a lot of types that serialization can not cope with etc. Also if you have your attributes wrong, it is common to get an exception when trying to read the xml back.
I tend to create an example tree of the objects that can be serialized with at least one example of each class (and subclass). Then at a minimum serialize the object tree to a stringstream and then read it back from the stringstream.
You will be amazed the number of time this catches a problem and save me having to wait for the application to start up to find the problem. This level of unit testing is more about speeding up development rather then increasing quality, so I would not do it for working serialization.
As other people have said, if you need to be able to read back data saved by old versions of your software, you had better keep a set of example data files for each shipped version and have tests to confirm you can still read them. This is harder then it seems at first, as the meaning of fields on a object may change between versions, so just being able to create the current object from a old serialized file is not enough, you have to check that the meaning is the same as it was it the version of the software that saved the file. (Put a version attribute in your root object now!)
In my experience it is definitely worth doing, especially if the XML is going to be used as an XML document by the consumer. For example, the consumer may need to have every element present in the document, either to avoid null checking of nodes when traversing or to pass schema validation.
By default the XML serializer will omit properties with a null value unless you add the [XmlElement(IsNullable = true)] attribute. Similarly, you may have to redirect generic list properties to standard arrays with an XMLArray attribute.
As another contributor said, if the object is changing over time, you need to continuously check that the output is consistent. It will also protect you against the serializer itself changing and not being backwards compatible, although you'd hope that this doesn't happen.
So for anything other than trivial uses, or where the above considerations are irrelevant, it is worth the effort of unit testing it.
I agree with you that you will be testing the .NET implementation more than you'll be testing your own code. But if that's what you want to do (perhaps you don't trust the .NET implementation :) ), I might approach your three questions as follows.
Yes, it's certainly possible to test the writer without the reader. Use the writer to serialize the example (20-year old Bob) you provided to a MemoryStream. Open the MemoryStream with an XmlDocument. Assert the root node is named "MyObject". Assert it has one attribute named "Height" with value "300". Assert there is a "Name" element containing a text node with value "Bob". Assert there is an "Age" element containing a text node with value "20".
Just do the reverse process of #1. Create an XmlDocument from the 20-year old Bob XML string. Deserialize the stream with the reader. Assert the Name property equals "Bob". Assert the Age property equals 20. You can do things like add test case with insignificant whitespace or single quotes instead of double-quotes to be more thorough.
See #1. You can extend it by adding what you consider to be tricky "edge" cases you think could break it. Names with various Unicode characters. Extra long names. Empty names. Negative ages. Etc.
We do acceptance testing of our serialization rather than unit testing.
What this means is that our acceptance testers take the XML schema, or as in your case some sample XML, and re-create their own serializable data-transfer class.
We then use NUnit to test our WCF service with this clean-room XML.
With this technique we've identified many, many errors. For example, where we have changed the name of the .NET member and forgotten to add an
[XmlElement]
tag with aName =
property.Seeing how you can't really fix serialization, you shouldn't be testing it - instead, you should be testing your own code and the way it interacts with the serialization mechanism. For example, you might need to unit-test the structure of the data you're serializing to make sure that no-one accidentally changes a field or something.
Speaking of which, I have recently adopted a practice where I check such things at compile-time rather than during execution of unit tests. It's a bit tedious, but I have a component that can traverse the AST, and then I can read it in a T4 template and write lots of
#error
messages if I meet something that shouldn't be there.I would argue that it is essential to unit test serialization if it is vitally important that you can read data between versions. And you must test with "known good" data (i.e. it isn't sufficient to simply write data in the current version and then read it again).
You mention that you don't have a schema... why not generate one? Either by hand (it isn't very hard), or with
xsd.exe
. Then you have something to use as a template, and you can verify this just usingXmlReader
. I'm doing a lot of work with xml serialization at the moment, and it is a lot easier to update the schema than it is to worry about whether I'm getting the data right.Even
XmlSerializer
can get complex; particularly if you involve subclasses ([XmlInclude]
), custom serialization (IXmlSerializable
), or non-defaultXmlSerializer
construction (passing additional metadata at runtime to the ctor). Another possibility is creative use of[XmlIngore]
,[XmlAnyAttribute]
or[XmlAnyElement]
; for example you might support unexpected data for round-trip (only) in version X, but store it in a known property in version Y.With serialization in general:
The reason is simple: you can break the data! How badly you do this depends on the serializer; for example, with
BinaryFormatter
(and I know the question isXmlSerializer
), simply changing from:to
could be enough to break serialization, as the field name has changed (and
BinaryFormatter
loves fields).There are other occasions when you might accidentally rename the data (even in contract-based serializers such as
XmlSerializer
/DataContractSerializer
). In such cases you can usually override the wire identifiers (for example[XmlAttribute("name")]
etc), but it is important to check this!Ultimately, it comes down to: is it important that you can read old data? It usually is; so don't just ship it... prove that you can.