When doing performance testing, why are the initia

2019-07-21 03:11发布

问题:

It seems like every time I run performance tests, there is always a "wind down" time in the first few iterations before times stabilize.

Here's the performance testing code (in this case, I was testing the difference between Lambda and LINQ):

using System;
using System.Collections.Generic;
using System.Diagnostics;

namespace Sandbox
{
    public class Program
    {
        private static long sum = 0;
        private static int count = 0;

        public class Item
        {
            public string name;
            public int id;
        }

        public static void Main(string[] args)
        {
            // START TESTING PARAMETERS
            List<Item> items = new List<Item>();

            for (int i = 0; i < 1000; i++)
            {
                items.Add(new Item
                {
                    id = i,
                    name = "name_" + i.ToString()
                });
            }

            // END TESTING PARAMETERS

            Stopwatch sw = new Stopwatch();
            sw.Start();
            for (int j = 0; j < 10; j++)
            {
                for (int i = 0; i < 5000; i++)
                {
                    // START TESTING CODE

                    Item itm = items.Find(x => x.name == "name_" + i.ToString());

                    // END TESTING CODE
                }
                sum += sw.ElapsedMilliseconds;
                count++;
                sw.Restart();
                Console.WriteLine("Average: {0}", sum / count);
            }
        }
    }
}

And here are the average results of 5 iterations of 100,000 test runs:

Average: 1023    Average: 1079    Average: 1017    Average: 1147    Average: 1054
Average: 1003    Average: 963     Average: 1001    Average: 1007    Average: 1020
Average: 1009    Average: 926     Average: 951     Average: 958     Average: 966
Average: 972     Average: 908     Average: 927     Average: 934     Average: 936
Average: 946     Average: 896     Average: 922     Average: 919     Average: 918
Average: 931     Average: 889     Average: 926     Average: 910     Average: 907
Average: 919     Average: 883     Average: 916     Average: 903     Average: 899
Average: 911     Average: 880     Average: 908     Average: 898     Average: 893
Average: 904     Average: 877     Average: 902     Average: 894     Average: 899
Average: 899     Average: 874     Average: 909     Average: 891     Average: 894
Average: 895     Average: 873     Average: 926     Average: 889     Average: 890
Average: 898     Average: 871     Average: 937     Average: 886     Average: 887
Average: 898     Average: 869     Average: 944     Average: 884     Average: 907
Average: 894     Average: 868     Average: 938     Average: 882     Average: 921
Average: 891     Average: 868     Average: 934     Average: 881     Average: 923
Average: 889     Average: 867     Average: 929     Average: 880     Average: 919
Average: 887     Average: 866     Average: 925     Average: 884     Average: 916
Average: 885     Average: 866     Average: 931     Average: 892     Average: 912
Average: 889     Average: 865     Average: 927     Average: 902     Average: 909
Average: 891     Average: 870     Average: 924     Average: 907     Average: 917

Any reason why each time I do testing, there is a wind down period?

回答1:

You want to take a look at at Eric Lippert's series on performance tests

Mistake #6: Treat the first run as nothing special when measuring average performance.

In order to get a good result out of a benchmark test in a world with potentially expensive startup costs due to jitting code, loading libraries and calling static constructors, you've got to apply some careful thought about what you're actually measuring.

If, for example, you are benchmarking for the specific purpose of analyzing startup costs then you're going to want to make sure that you measure only the first run. If on the other hand you are benchmarking part of a service that is going to be running millions of times over many days and you wish to know the average time that will be taken in a typical usage then the high cost of the first run is irrelevant and therefore shouldn't be part of the average. Whether you include the first run in your timings or not is up to you; my point is, you need to be cognizant of the fact that the first run has potentially very different costs than the second.

...

Moreover, it's important to note that different jitters give different results on different machines and in different versions of the .NET framework. The time taken to jit can vary greatly, as can the amount of optimization generated in the machine code. The jit compilers on the Windows 32 bit desktop, Windows 64 bit desktop, Silverlight running on a Mac, and the "compact" jitter that runs when you have a C# program in XNA on XBOX 360 all have potentially different performance characteristics.

In short JIT'ing is expensive. You shouldn't factor it into your tests unless that is what you want. It depends on typical usage. If your code is going to startup once and stay up for long periods, then discard the first tests, but if it is mostly going to be start and stops, then the first test will be important.



回答2:

The reason is during the first iteration most data and code isn't cached — in CPU caches, operating system caches, disk caches, database caches, etc. In certain execution environments like .NET or Java just-in-time compilation plays a role, too. The second and further iterations already have the advantage of their data and code being present in caches, thus performing generally faster.

Hence it is a good idea to always ignore the first (few; depending on complexity) iteration and don't count it to the statistics when measuring average times. However, the exact behavior depends on size of your data set, complexity of the algorithm, dependencies like database usage, hardware and many other factors.