Code documentation: How much is too much?

2019-02-07 00:02发布

问题:

How much code documentation in your .NET source is too much?

Some background: I inherited a large codebase that I've talked about in some of the other questions I've posted here on SO. One of the "features" of this codebase is a God Class, a single static class with >3000 lines of code encompassing several dozen static methods. It's everything from Utilities.CalculateFYBasedOnMonth() to Utilities.GetSharePointUserInfo() to Utilities.IsUserIE6(). It's all good code that doesn't need to be rewritten, just refactored into an appropriate set of libraries. I have that planned out.

Since these methods are moving into a new business layer, and my role on this project is to prepare the system for maintenance by other developers, I'm thinking about solid code documentation. While these methods all have good inline comments, they don't all have good (or any) code doco in the form of XML comments. Using a combo of GhostDoc and Sandcastle (or Document X), I can create some pretty nice HTML documentation and post it to SharePoint, which would let developers understand more about what the code does without navigating through the code itself.

As the amount of documentation in the code increases, the more difficult it becomes to navigate the code. I'm beginning to wonder if the XML comments will make the code more difficult to maintain than, say, a simpler //comment would on each method.

These examples are from the Document X sample:

        /// <summary>
        /// Adds a new %Customer:CustomersLibrary.Customer% to the collection.
        /// </summary>
        /// <returns>A new Customer instance that represents the new customer.</returns>
        /// <example>
        ///     The following example demonstrates adding a new customer to the customers
        ///     collection. 
        ///     <code lang="CS" title="Example">
        /// CustomersLibrary.Customer newCustomer = myCustomers.Add(CustomersLibrary.Title.Mr, "John", "J", "Smith");
        ///     </code>
        ///     <code lang="VB" title="Example">
        /// Dim newCustomer As CustomersLibrary.Customer = myCustomers.Add(CustomersLibrary.Title.Mr, "John", "J", "Smith")
        ///     </code>
        /// </example>
        /// <seealso cref="Remove">Remove Method</seealso>
        /// <param name="Title">The customers title.</param>
        /// <param name="FirstName">The customers first name.</param>
        /// <param name="MiddleInitial">The customers middle initial.</param>
        /// <param name="LastName">The customers last name.</param>
        public Customer Add(Title Title, string FirstName, string MiddleInitial, string LastName)
        {
            // create new customer instance
            Customer newCust = new Customer(Title, FirstName, MiddleInitial, LastName);

            // add to internal collection
            mItems.Add(newCust);

            // return ref to new customer instance
            return newCust;
        }

And:

    /// <summary>
    /// Returns the number of %Customer:CustomersLibrary.Customer% instances in the collection.
    /// </summary>
    /// <value>
    /// An Int value that specifies the number of Customer instances within the
    /// collection.
    /// </value>
    public int Count
    {
        get 
        {
            return mItems.Count;
        }
    }

So I was wondering from you: do you document all of your code with XML comments with the goal of using something like NDoc (RIP) or Sandcastle? If not, how do you decide what gets documentation and what doesn't? Something like an API would obviously have doco, but what about a codebase that you're going to hand off to another team to maintain?

What do you think I should do?

回答1:

I think a good part of the problem here is the verbose and crufty XML documentation syntax MS has foisted on us (JavaDoc wasn't much better either). The question of how to format it is, to a large degree, independent of how much is appropriate.

Using the XML format for comments is optional. You can use DOxygen or other tools that recognize different formats. Or write your own document extractor -- it isn't as hard as you might think to do a reasonable job and is a good learning experience.

The question of how much is more difficult. I think the idea of self-documenting code is fine, if you are digging in to maintain some code. If you are just a client, you shouldn't need to read the code to understand how a given function works. Lots of information is implicit in the data types and names, of course, but there is a great deal that is not. For instance, passing in a reference to an object tells you what is expected, but not how a null reference will be handled. Or in the OP's code, how any whitespace at the beginning or the end of the arguments are handled. I believe there is far more of this type of information that ought to be documented than is usually recognized.

To me it requires natural language documentation to describe the purpose of the function as well as any pre- and post-conditions for the function, its arguments, and return values which cannot be expressed through the programming language syntax.



回答2:

Nobody's mentioned your code doesn't need to be bloated, the XML documentation can be in another file:

/// <include file="Documentation/XML/YourClass.xml" path="//documentation/members[@name='YourClass']/*"/>

And then your Add method can contain no extra XML/comments above it, or if you prefer just the summary (as that's merged with the separate file).

It's far more powerful than the rubbish format that is Javadoc and derivatives you find in PHP/Javascript (though Javadoc paved the way for the XML syntax). Plus the tools available are far superior and the default look of the help docs is more readable and easier to customise (I can say that from having written doclets and comparing that process to Sandcastle/DocProject/NDoc).



回答3:

You're hitting a critical divide here between those that will be maintaining the new libraries and those that will be consuming the new libraries.

If I'm writing a new application and will be using these standard libraries, I should be getting a stable binary of the libraries and simply importing them into my application, not copying the source code down from a location and potentially causing problems if the code gets modified. In that case, I won't have access to any of the "self documenting" features other than the name of the method and the input/output parameters, and even those won't be exposed if I'm using some kind of IDE that doesn't have the auto complete featured turned on.

So in your example code above, I think it looks just fine. Things are not too verbose within the code itself and the names are self documenting. On the flip side, all of the necessary summary/parameter data is there so that a solid documentation piece could be constructed to allow those consuming the library to have all the critical information at their fingertips. Sadly XML is rather bloated, but by in large I think most developers can easily browse right by all the summary content and look into the actual code within the method.



回答4:

Jeff has a really good article about commenting (or should I say, not commenting) here...

http://www.codinghorror.com/blog/archives/001150.html

I know it seems like I'm not answering the question, but I think it's a valid point that code should be as self-documenting as possible.



回答5:

I tend to document all public methods in my own code; using GhostDoc makes this trivial. And in order to reduce the clutter when I edit my source code, I generally just collapse the comments by going first into "outline mode" (i.e. use Visual Studio's Outline > Collapse to definitions command).

I've never tried Sandcastle, but I really appreciate the comfort provided by Intellisense for the methods I have XML-commented.



回答6:

I always opt for the XML / Javadoc format comments, because I love being able to browse API documentation in a sensible format (HTML usually).

It does become a problem for browsing the actual source code, but I find that this is generally a minor issue, since Visual Studio is generally pretty smart about collapsing XML comments as necessary.



回答7:

Don't repeat yourself.

  • The first example should have a better method name and no comments at all.
  • The second example should not have a comment.

The name of the first method should reflect that a new object is allocated.

If that behavior is standard throughout the framework for each add, it should be documented on a higher level, not in this method API doc. Otherwise, change the name.

Comments should add information, not hide it in noise. And there should be comments, where necessary in XML. And where they add value.

I do not want to see: "returns the count" for a method named count.



回答8:

All public functions must be clearly understandable by someone who has a passing familiarity with your code base, but NOT in your specific section without having to delve into the code.

If you need to write a short line to explain what a function does, chances are you named your function/classes poorly. The name should be self explanatory in that case

If it requires more than 1 brief sentence to explain, that's probably a good comment

If it takes a paragraph, your function is probably doing too much besides likely unclear names.

It's usually better to err on the side of comments IF YOU MAKE SURE THEY ARE ACCURATE. Inaccurate and/or unmaintainable comments are worse than no comments

So applying these rules:

In your first example: "// create new customer instance" is redundant. The code is crystal clear. THe other comments are perfect. They clarify what the code is operating upon/what it's resuls are

In your second example the comments are wasted effort and make it hard to read. All you need to do is give the function a proper name. Not that vague "count". That is poor naming.



回答9:

I recently conducted a study that shows that if you have important "directives "e.g., Caller must do X" within a lot of specifications (e.g., "this method does X which means Y and Z"), there is a very high risk that your readers would miss the directives. In fact, when they see a long documentation, they skip reading it alltogether.

So at the least, separate the important stuff or use tagging (ask me if you use Java).



回答10:

It all depends on the standards your company is using, but for my crew, we document at the top of every function like in your second example (which by the way you can do in Visual Studio 2008 by hitting the "/" key 3 times in a row at the top of any sub or function!!).

The first example is overkill, especially the bottom couple of lines where each line is commented. However, I think the stuff at the header of the function might be useful though, because we use it here a lot. And it appears to be somewhat standard from what I can tell from lots of other programmers.



回答11:

I've seen coding standards that recommend against commenting self-commenting code and method overloads. While YMMV, it sounds like a good way to get away from the "Field _numberOfCars is an integer that represents the number of cars"-type comments that lead into overkill.



回答12:

Comments in a header for generating documentation is a good thing. Putting comments in code to explain why you are doing what you are doing is also usually a good thing. Putting in redundant comments paraphrasing what you did is not a good thing



回答13:

What you have shown is FAR TOO MUCH. Do your self a favour and delete it!

Code should first off be self documenting, through meaningful method and parameter names. In the example you have shown;

public Customer Add(Title Title, string FirstName, string MiddleInitial, string LastName) is perfectly understandable to the intent of what is happening, as is 'Count'.

Commenting such as this, as you pointed out, is purely noise around what is otherwise easy to read code. Most developers will sooner open up examine and use the code, than pile through obscure auto-generated API documentation. Everytime!



回答14:

By the way, according to "Clean code" (A great book, BTW), one should avoid using HTML/XML markups within comments that are embedded in the source code. Even if your IDE can create nifty documentation when you hover, it is considered too distracting and unreadable when you just browse your sources.