Pitfalls of code coverage [closed]

2019-03-08 08:20发布

站内文章 / 移动开发

35 0

男人必须洒脱

女 | 书童

私信

可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

I'm looking for real world examples of some bad side effects of code coverage.

I noticed this happening at work recently because of a policy to achieve 100% code coverage. Code quality has been improving for sure but conversely the testers seem to be writing more lax test plans because 'well the code is fully unit tested'. Some logical bugs managed to slip through as a result. They were a REALLY BIG PAIN to debug because 'well the code is fully unit tested'.

I think that was partly because our tool did statement coverage only. Still, it could have been time better spent.

If anyone has other negative side effects of having a code coverage policy please share. I'd like to know what kind of other 'problems' are happening out there in the real-world.

Thanks in advance.

EDIT: Thanks for all the really good responses. There are a few which I would mark as the answer but I can only mark one unfortunately.

回答1:

In a sentence: Code coverage tells you what you definitely haven't tested, not what you have.

Part of building a valuable unit test suite is finding the most important, high-risk code and asking hard questions of it. You want to make sure the tough stuff works as a priority. Coverage figures have no notion of the 'importance' of code, nor the quality of tests.

In my experience, many of the most important tests you will ever write are the tests that barely add any coverage at all (edge cases that add a few extra % here and there, but find loads of bugs).

The problem with setting hard and (potentially counter-productive) coverage targets is that developers may have to start bending over backwards to test their code. There's making code testable, and then there's just torture. If you hit 100% coverage with great tests then that's fantastic, but in most situations the extra effort is just not worth it.

Furthermore, people start obsessing/fiddling with numbers rather than focussing on the quality of the tests. I've seen badly written tests that have 90+% coverage, just as I've seen excellent tests that only have 60-70% coverage.

Again, I tend to look at coverage as an indicator of what definitely hasn't been tested.

回答2:

In my experience, the biggest problem with code coverage tools is the risk that somebody will fall victim to the belief that "high code coverage" equals "good testing." Most coverage tools just offer statement coverage metrics, as opposed to condition, data path or decision coverage. That means that it's possible to get 100% coverage on a bit of code like this:

for (int i = 0; i < MAX_RETRIES; ++i) {
    if (someFunction() == MAGIC_NUMBER) {
        break;
    }
}

... without ever testing the termination condition on the for loop.

Worse, it's possible to get very high "coverage" from a test that simply invokes your application, without bothering to validate the output, or validating it incorrectly.

Simply put, low code coverage levels is certainly an indication of insufficient testing, but high coverage levels are not an indication of sufficient or correct testing.

回答3:

Just because there's code coverage doesn't mean you're actually testing all paths through the function.

For example, this code has four paths:

if (A) { ... } else { ... }
if (B) { ... } else { ... }

However just two tests (e.g. one with A and B true, one with A and B false) would give "100% code coverage."

This is a problem because the tendency is to stop testing once you've achieved the magic 100% number.

回答4:

Sometimes corner cases are so rare they're not worth testing, yet a strict code-coverage rule requires you test it anyway.

For example, in Java the MD5 algorithm is built-in, but technically it's possible that an "unsupported algorithm" type exception is thrown. It's never thrown and your test would have to go through significant gyrations to test that path.

It would be a lot of work wasted.

回答5:

In my opinion, the greatest danger a team runs from measuring code coverage is that it rewards large tests, and penalizes small ones. If you have the choice between writing a single test that covers a large portion of your application's functionality, and writing ten small tests which test a single method, only measuring code coverage implies that you should write the large test.

However, writing the set of 10 small tests will give you much less brittle tests, and will test your application much more thoroughly than the one large test will. Thus, by measuring code coverage, particularly in an organization with still evolving testing habits, you can often set up the wrong incentives.

回答6:

I know this isn't a direct answer to your question, but...

Any testing, regardless of what type, is insufficient by itself. Unit testing/code coverage is for developers. QA still needs to test the system as a whole. Business users still need to test the system as a whole as well.

The converse, QA tests the code completely, so developers shouldn't test is equally as bad. Testing is complimentary and different tests provide different things. Each test type can miss things that another might find.

Just like the rest of development, don't take shortcuts with testing, it'll only let bugs through.

回答7:

Writing too targeted test cases.
Insufficient input variability testing of the Code
Large number of artificial test cases executed.
Not concentrating on the important test failures due to noise.
Difficulty in assigning defects because many conditions from many components must interact for a line to execute.

The worst side effect of having a 100% coverage goal is to spend a lot of the testing development cycle (75%+) hiting corner cases. Another poor effect of such a policy is the concentration of hitting a particular line of code rather than addressing the range of inputs. I don't really care that the strcpy function ran at least once. I really care that it ran against a wide variety of input. Having a policy is good. But having any extremely draconian policy is bad. The 100% metric of code coverage is neither necessary nor sufficient for code to be considered solid.

回答8:

One of the largest pitfalls of code coverage is that people just talk about code coverage without actually specifying what type of code coverage they are talking about. The characteristics of C0, C1, C2 and even higher levels of code coverage are very different, so just talking about "code coverage" doesn't even make sense.

For example, achieving 100% full path coverage is pretty much impossible. If your program has n decision points, you need 2ⁿ tests (and depending on the definition, every single bit in a value is a decision point, so to achieve 100% full path coverage for an extremely simple function that just adds two ints, you need 18446744073709551616 tests). If you only have one loop, you already need infinitely many tests.

OTOH, achieving 100% C0 coverage is trivial.

Another important thing to remember, is that code coverage does not tell you what code was tested. It only tells you what code was run! You can try it out yourself: take a codebase that has 100% code coverage. Remove all the assertions from the tests. Now the codebase still has 100% coverage, but does not test a single thing! So, code coverage does not tell you what's tested, only what's not tested.

回答9:

There are tools out there, Jumble for one, that perform analysis through branch coverage, by mutating your code to see if your test fails for all different permutations.

Directly from their website:

Jumble is a class level mutation testing tool that works in conjunction with JUnit. The purpose of mutation testing is to provide a measure of the effectiveness of test cases. A single mutation is performed on the code to be tested, the corresponding test cases are then executed. If the modified code fails the tests, then this increases confidence in the tests. Conversely, if the modified code passes the tests this indicates a testing deficiency.

回答10:

Nothing wrong with code coverage - what I see wrong is the 100% figure. At some point the law of diminished returns kicks in and it becomes more expensive to test the last 1% than the other 99%. Code coverage is a worthy goal but common sense goes a long way.

回答11:

!00% code coverage means well tested code is a complete myth. As developers we know the hard/complex/delicate parts of a system, and I would much rather see those areas properly tested, and only get 50% coverage, rather than the meaningless figure that every line has been run at least once.

In terms of a real world example, the only team that I was on that had 100% coverage wrote some of the worst code I've ever seen. 100% coverage was used to replace code review - the result was predicatably awful, to the extent that most code was thrown away, even though it passed the tests.

回答12:

We have good tools for measuring code-coverage from unit tests. So it's tempting to rely on code-coverage of 100% to represent that you're "done testing." This is not true.

As other folks have mentioned, 100% code coverage doesn't prove that you have tested adequately, nor does 50% code coverage necessarily mean that you haven't tested adequately.

Measuring lines of code executed by tests is just one metric. You also have to test for a reasonable variety of function inputs, and also how the function or class behaves depending on some other external state. For example, some code functions differently based on the data in a database or in a file.

I've also blogged about this recently: http://karwin.blogspot.com/2009/02/unit-test-coverage.html

回答13:

100% code coverage doesn't mean you're done with usnit tests

function int divide(int a, int b) {
    return a/b;
}

With just 1 unit test, I get 100% code coverage for this function:

return divide(4,2) == 2;

Now, nobody would argue that this unit code with 100% coverage indicates that he feature works just fine.

I think code coverage is a good element to know if you are missing any obvious code path, but I would use it carefully.

标签： unit-testing code-coverage

男人必须洒脱

女 | 书童

私信

收藏的人(0)

Ta的文章更多文章

0条评论

还没有人评论过~