How much code documentation in your .NET source is too much?
Some background: I inherited a large codebase that I've talked about in some of the other questions I've posted here on SO. One of the "features" of this codebase is a God Class, a single static class with >3000 lines of code encompassing several dozen static methods. It's everything from Utilities.CalculateFYBasedOnMonth()
to Utilities.GetSharePointUserInfo()
to Utilities.IsUserIE6()
. It's all good code that doesn't need to be rewritten, just refactored into an appropriate set of libraries. I have that planned out.
Since these methods are moving into a new business layer, and my role on this project is to prepare the system for maintenance by other developers, I'm thinking about solid code documentation. While these methods all have good inline comments, they don't all have good (or any) code doco in the form of XML comments. Using a combo of GhostDoc and Sandcastle (or Document X), I can create some pretty nice HTML documentation and post it to SharePoint, which would let developers understand more about what the code does without navigating through the code itself.
As the amount of documentation in the code increases, the more difficult it becomes to navigate the code. I'm beginning to wonder if the XML comments will make the code more difficult to maintain than, say, a simpler //comment
would on each method.
These examples are from the Document X sample:
/// <summary>
/// Adds a new %Customer:CustomersLibrary.Customer% to the collection.
/// </summary>
/// <returns>A new Customer instance that represents the new customer.</returns>
/// <example>
/// The following example demonstrates adding a new customer to the customers
/// collection.
/// <code lang="CS" title="Example">
/// CustomersLibrary.Customer newCustomer = myCustomers.Add(CustomersLibrary.Title.Mr, "John", "J", "Smith");
/// </code>
/// <code lang="VB" title="Example">
/// Dim newCustomer As CustomersLibrary.Customer = myCustomers.Add(CustomersLibrary.Title.Mr, "John", "J", "Smith")
/// </code>
/// </example>
/// <seealso cref="Remove">Remove Method</seealso>
/// <param name="Title">The customers title.</param>
/// <param name="FirstName">The customers first name.</param>
/// <param name="MiddleInitial">The customers middle initial.</param>
/// <param name="LastName">The customers last name.</param>
public Customer Add(Title Title, string FirstName, string MiddleInitial, string LastName)
{
// create new customer instance
Customer newCust = new Customer(Title, FirstName, MiddleInitial, LastName);
// add to internal collection
mItems.Add(newCust);
// return ref to new customer instance
return newCust;
}
And:
/// <summary>
/// Returns the number of %Customer:CustomersLibrary.Customer% instances in the collection.
/// </summary>
/// <value>
/// An Int value that specifies the number of Customer instances within the
/// collection.
/// </value>
public int Count
{
get
{
return mItems.Count;
}
}
So I was wondering from you: do you document all of your code with XML comments with the goal of using something like NDoc (RIP) or Sandcastle? If not, how do you decide what gets documentation and what doesn't? Something like an API would obviously have doco, but what about a codebase that you're going to hand off to another team to maintain?
What do you think I should do?
I always opt for the XML / Javadoc format comments, because I love being able to browse API documentation in a sensible format (HTML usually).
It does become a problem for browsing the actual source code, but I find that this is generally a minor issue, since Visual Studio is generally pretty smart about collapsing XML comments as necessary.
I recently conducted a study that shows that if you have important "directives "e.g., Caller must do X" within a lot of specifications (e.g., "this method does X which means Y and Z"), there is a very high risk that your readers would miss the directives. In fact, when they see a long documentation, they skip reading it alltogether.
So at the least, separate the important stuff or use tagging (ask me if you use Java).
I've seen coding standards that recommend against commenting self-commenting code and method overloads. While YMMV, it sounds like a good way to get away from the "Field _numberOfCars is an integer that represents the number of cars"-type comments that lead into overkill.
Nobody's mentioned your code doesn't need to be bloated, the XML documentation can be in another file:
And then your Add method can contain no extra XML/comments above it, or if you prefer just the summary (as that's merged with the separate file).
It's far more powerful than the rubbish format that is Javadoc and derivatives you find in PHP/Javascript (though Javadoc paved the way for the XML syntax). Plus the tools available are far superior and the default look of the help docs is more readable and easier to customise (I can say that from having written doclets and comparing that process to Sandcastle/DocProject/NDoc).
I think a good part of the problem here is the verbose and crufty XML documentation syntax MS has foisted on us (JavaDoc wasn't much better either). The question of how to format it is, to a large degree, independent of how much is appropriate.
Using the XML format for comments is optional. You can use DOxygen or other tools that recognize different formats. Or write your own document extractor -- it isn't as hard as you might think to do a reasonable job and is a good learning experience.
The question of how much is more difficult. I think the idea of self-documenting code is fine, if you are digging in to maintain some code. If you are just a client, you shouldn't need to read the code to understand how a given function works. Lots of information is implicit in the data types and names, of course, but there is a great deal that is not. For instance, passing in a reference to an object tells you what is expected, but not how a null reference will be handled. Or in the OP's code, how any whitespace at the beginning or the end of the arguments are handled. I believe there is far more of this type of information that ought to be documented than is usually recognized.
To me it requires natural language documentation to describe the purpose of the function as well as any pre- and post-conditions for the function, its arguments, and return values which cannot be expressed through the programming language syntax.
What you have shown is FAR TOO MUCH. Do your self a favour and delete it!
Code should first off be self documenting, through meaningful method and parameter names. In the example you have shown;
public Customer Add(Title Title, string FirstName, string MiddleInitial, string LastName) is perfectly understandable to the intent of what is happening, as is 'Count'.
Commenting such as this, as you pointed out, is purely noise around what is otherwise easy to read code. Most developers will sooner open up examine and use the code, than pile through obscure auto-generated API documentation. Everytime!