Performance testing best practices when doing TDD?

2019-03-13 09:01发布

问题:

I'm working on a project which is in serious need of some performance tuning.

How do I write a test that fails if my optimizations do not in improve the speed of the program?

To elaborate a bit:

The problem is not discovering which parts to optimize. I can use various profiling and benchmarking tools for that.

The problem is using automated tests to document that a specific optimization did indeed have the intended effect. It would also be highly desirable if I could use the test suite to discover possible performance regressions later on.

I suppose I could just run my profiling tools to get some values and then assert that my optimized code produces better values. The obvious problem with that, however, is that benchmarking values are not hard values. They vary with the local environment.

So, is the answer to always use the same machine to do this kind of integration testing? If so, you would still have to allow for some fuzziness in the results, since even on the same hardware benchmarking results can vary. How then to take this into account?

Or maybe the answer is to keep older versions of the program and compare results before and after? This would be my preferred method, since it's mostly environment agnostic. Does anyone have experience with this approach? I imagine it would only be necessary to keep one older version if all the tests can be made to pass if the performance of the latest version is at least as good as the former version.

回答1:

I suspect that applying TDD to drive performance is a mistake. By all means, use it to get to good design and working code, and use the tests written in the course of TDD to ensure continued correctness - but once you have well-factored code and a solid suite of tests, you are in good shape to tune, and different (from TDD) techniques and tools apply.

TDD gives you good design, reliable code, and a test coverage safety net. That puts you into a good place for tuning, but I think that because of the problems you and others have cited, it's simply not going to take you much further down the tuning road. I say that as a great fan and proponent of TDD and a practitioner.



回答2:

First you need to establish some criteria for acceptable performance, then you need to devise a test that will fail that criteria when using the existing code, then you need to tweak your code for performance until it passes the test. You will probably have more than one criteria for performance, and you should certainly have more than one test.



回答3:

In many server applications (might not be your case) performance problem manifest only under concurrent access and under load. Measuring absolute time a routine executes and trying to improve it is therefore not very helpful. There are problems with this method even in single-threaded applications. Measuring absolute routine time relies on the clock the platform is providing, and these are not always very precise; you better rely on average time a routine takes.

My advice is:

  • Use profiling to identify routines that execute the most times and take most time.
  • Use tool like JMeter or Grinder to elaborate representative test cases, simulate concurrent access, put your application under stress and measure (more importantly) throughput and average response time. This will give you a better idea of how your application is behaving as seen from the outside perspective.

While you could use unit tests to establish some non functional aspects of your application, I think that the approach given above will give better results during optimization process. When placing time-related assertions in your unit tests you will have to choose some very approximative values: time can vary depending on the environment you are using to run your unit tests. You don't want tests to fail only because some of your colleagues are using inferior hardware.

Tuning is all about finding right things to tune. You already have a functioning code, so placing performance related assertions a posteriori and without establishing critical sections of code might lead you to waste a lot of time on optimizing non-essential pieces of your application.



回答4:

Record the running time of the current code.

if (newCode.RunningTime >= oldCode.RunningTime) Fail


回答5:

Run the tests + profiling in CI server. You can also run load tests periodically.

You are concerned about differences (as you mentioned), so its not about defining an absolute value. Have an extra step that compares the performance measures of this run with the one of the last build, and report on the differences as %. You can raise a red flag for important variations of time.

If you are concerned on performance, you should have clear goals you want to meet and assert them. You should measure those with tests on the full system. Even if your application logic is fast, you might have issues with the view causing you to miss the goal. You can also combine it with the differences approach, but for these you would have less tolerance to time variations.

Note that you can run the same process in your dev computer, just using only the previous runs in that computer and not a shared one between developers.



回答6:

For the tuning itself, you can compare the old code and new code directly. But don't keep both copies around. This sounds like a nightmare to manage. Also, you're only ever comparing one version with another version. It's possible that a change in functionality will slow down your function, and that is acceptable to the users.

Personally, I've never seen performance criteria of the type 'must be faster than the last version', because it is so hard to measure.

You say 'in serious need of performance tuning'. Where? Which queries? Which functions? Who says, the business, the users? What is acceptable performance? 3 seconds? 2 seconds? 50 milliseconds?

The starting point for any performance analysis is to define the pass/fail criteria. Once you have this, you CAN automate the performance tests.

For reliability, you can use a (simple) statistical approach. For example, run the same query under the same conditions 100 times. If 95% of them return in under n seconds, that is a pass.

Personally, I would do this at integration time, from either a standard machine, or the integration server itself. Record the values for each test somewhere (cruise control has some nice features for this sort of thing). If you do this, you can see how performance progresses over time, and with each build. You can even make a graph. Managers like graphs.

Having a stable environment is always hard to do when doing performance testing, whether or not you're doing automated tests or not. You'll have that particular problem no matter how you develop (TDD, Waterfall, etc).



回答7:

Not faced this situation yet ;) however if I did, here's how I'd go about it. (I think I picked this up from Dave Astel's book)

Step#1: Come up with a spec for 'acceptable performance' so for example, this could mean 'The user needs to be able to do Y in N secs (or millisecs)'
Step#2: Now write a failing test.. Use your friendly timer class (e.g. .NET has the StopWatch class) and Assert.Less(actualTime, MySpec)
Step#3: If the test already passes, you're done. If red, you need to optimize and make it green. As soon as the test goes green, the performance is now 'acceptable'.



回答8:

kent beck and his team automated all the tests in TDD.

here for performance testing also we can automate the tests in TDD.

the criteria here in performance testing is we should test the yes or no cases

if we know the specfications well n good we can automate them also in TDD



回答9:

Whilst I broadly agree with Carl Manaster's answer, with modern tools it's possible to get some of the advantages that TDD offers for functional testing into performance testing.

With most modern performance testing frameworks (most of my experience is with Gatling, but I believe that same's true of newer versions of most performance test frameworks), it's possible to integrate automated performance tests into the continuous integration build, and configure it so that the CI build will fail if the performance requirements aren't met.

So provided it's possible to agree beforehand what your performance requirements are (which for some applications may be driven by SLAs agreed with users or clients), this can give you rapid feedback if a change has created a performance issue, and identify areas that need performance improvements.

Good performance requirements are along the lines of "when there are 5000 orders per hour, 95% of user journeys should include no more than 10 seconds of waiting, and no screen transition taking more than 1 second".

This also relies on having deployment to a production-like test environment in your CI pipeline.

However, it's probably still not a good idea to use performance requirements to drive your development in the same way that you could with functional requirements. With functional requirements, you generally have some insight into whether your application will pass the test before you run it, and it's sensible to try to write code that you think will pass. With performance, trying to optimize code whose performance hasn't been measured is a dubious practice. You can use performance results to drive your application development to some extent, just not performance requirements.