I'm looking for real world examples of some bad side effects of code coverage.
I noticed this happening at work recently because of a policy to achieve 100% code coverage. Code quality has been improving for sure but conversely the testers seem to be writing more lax test plans because 'well the code is fully unit tested'. Some logical bugs managed to slip through as a result. They were a REALLY BIG PAIN to debug because 'well the code is fully unit tested'.
I think that was partly because our tool did statement coverage only. Still, it could have been time better spent.
If anyone has other negative side effects of having a code coverage policy please share. I'd like to know what kind of other 'problems' are happening out there in the real-world.
Thanks in advance.
EDIT: Thanks for all the really good responses. There are a few which I would mark as the answer but I can only mark one unfortunately.
Just because there's code coverage doesn't mean you're actually testing all paths through the function.
For example, this code has four paths:
However just two tests (e.g. one with A and B true, one with A and B false) would give "100% code coverage."
This is a problem because the tendency is to stop testing once you've achieved the magic 100% number.
In my experience, the biggest problem with code coverage tools is the risk that somebody will fall victim to the belief that "high code coverage" equals "good testing." Most coverage tools just offer statement coverage metrics, as opposed to condition, data path or decision coverage. That means that it's possible to get 100% coverage on a bit of code like this:
... without ever testing the termination condition on the for loop.
Worse, it's possible to get very high "coverage" from a test that simply invokes your application, without bothering to validate the output, or validating it incorrectly.
Simply put, low code coverage levels is certainly an indication of insufficient testing, but high coverage levels are not an indication of sufficient or correct testing.
In a sentence: Code coverage tells you what you definitely haven't tested, not what you have.
Part of building a valuable unit test suite is finding the most important, high-risk code and asking hard questions of it. You want to make sure the tough stuff works as a priority. Coverage figures have no notion of the 'importance' of code, nor the quality of tests.
In my experience, many of the most important tests you will ever write are the tests that barely add any coverage at all (edge cases that add a few extra % here and there, but find loads of bugs).
The problem with setting hard and (potentially counter-productive) coverage targets is that developers may have to start bending over backwards to test their code. There's making code testable, and then there's just torture. If you hit 100% coverage with great tests then that's fantastic, but in most situations the extra effort is just not worth it.
Furthermore, people start obsessing/fiddling with numbers rather than focussing on the quality of the tests. I've seen badly written tests that have 90+% coverage, just as I've seen excellent tests that only have 60-70% coverage.
Again, I tend to look at coverage as an indicator of what definitely hasn't been tested.
The worst side effect of having a 100% coverage goal is to spend a lot of the testing development cycle (75%+) hiting corner cases. Another poor effect of such a policy is the concentration of hitting a particular line of code rather than addressing the range of inputs. I don't really care that the strcpy function ran at least once. I really care that it ran against a wide variety of input. Having a policy is good. But having any extremely draconian policy is bad. The 100% metric of code coverage is neither necessary nor sufficient for code to be considered solid.
One of the largest pitfalls of code coverage is that people just talk about code coverage without actually specifying what type of code coverage they are talking about. The characteristics of C0, C1, C2 and even higher levels of code coverage are very different, so just talking about "code coverage" doesn't even make sense.
For example, achieving 100% full path coverage is pretty much impossible. If your program has
n
decision points, you need 2n tests (and depending on the definition, every single bit in a value is a decision point, so to achieve 100% full path coverage for an extremely simple function that just adds twoint
s, you need 18446744073709551616 tests). If you only have one loop, you already need infinitely many tests.OTOH, achieving 100% C0 coverage is trivial.
Another important thing to remember, is that code coverage does not tell you what code was tested. It only tells you what code was run! You can try it out yourself: take a codebase that has 100% code coverage. Remove all the assertions from the tests. Now the codebase still has 100% coverage, but does not test a single thing! So, code coverage does not tell you what's tested, only what's not tested.
Nothing wrong with code coverage - what I see wrong is the 100% figure. At some point the law of diminished returns kicks in and it becomes more expensive to test the last 1% than the other 99%. Code coverage is a worthy goal but common sense goes a long way.