Performance of Arrays vs. Lists

2019-01-01 08:32发布

Say you need to have a list/array of integers which you need iterate frequently, and I mean extremely often. The reasons may vary, but say it's in the heart of the inner most loop of a high volume processing.

In general, one would opt for using Lists (List) due to their flexibility in size. On top of that, msdn documentation claims Lists use an array internally and should perform just as fast (a quick look with Reflector confirms this). Neverless, there is some overhead involved.

Did anyone actually measure this? would iterating 6M times through a list take the same time as an array would?

13条回答
浮光初槿花落
2楼-- · 2019-01-01 08:38

Very easy to measure...

In a small number of tight-loop processing code where I know the length is fixed I use arrays for that extra tiny bit of micro-optimisation; arrays can be marginally faster if you use the indexer / for form - but IIRC believe it depends on the type of data in the array. But unless you need to micro-optimise, keep it simple and use List<T> etc.

Of course, this only applies if you are reading all of the data; a dictionary would be quicker for key-based lookups.

Here's my results using "int" (the second number is a checksum to verify they all did the same work):

(edited to fix bug)

List/for: 1971ms (589725196)
Array/for: 1864ms (589725196)
List/foreach: 3054ms (589725196)
Array/foreach: 1860ms (589725196)

based on the test rig:

using System;
using System.Collections.Generic;
using System.Diagnostics;
static class Program
{
    static void Main()
    {
        List<int> list = new List<int>(6000000);
        Random rand = new Random(12345);
        for (int i = 0; i < 6000000; i++)
        {
            list.Add(rand.Next(5000));
        }
        int[] arr = list.ToArray();

        int chk = 0;
        Stopwatch watch = Stopwatch.StartNew();
        for (int rpt = 0; rpt < 100; rpt++)
        {
            int len = list.Count;
            for (int i = 0; i < len; i++)
            {
                chk += list[i];
            }
        }
        watch.Stop();
        Console.WriteLine("List/for: {0}ms ({1})", watch.ElapsedMilliseconds, chk);

        chk = 0;
        watch = Stopwatch.StartNew();
        for (int rpt = 0; rpt < 100; rpt++)
        {
            for (int i = 0; i < arr.Length; i++)
            {
                chk += arr[i];
            }
        }
        watch.Stop();
        Console.WriteLine("Array/for: {0}ms ({1})", watch.ElapsedMilliseconds, chk);

        chk = 0;
        watch = Stopwatch.StartNew();
        for (int rpt = 0; rpt < 100; rpt++)
        {
            foreach (int i in list)
            {
                chk += i;
            }
        }
        watch.Stop();
        Console.WriteLine("List/foreach: {0}ms ({1})", watch.ElapsedMilliseconds, chk);

        chk = 0;
        watch = Stopwatch.StartNew();
        for (int rpt = 0; rpt < 100; rpt++)
        {
            foreach (int i in arr)
            {
                chk += i;
            }
        }
        watch.Stop();
        Console.WriteLine("Array/foreach: {0}ms ({1})", watch.ElapsedMilliseconds, chk);

        Console.ReadLine();
    }
}
查看更多
像晚风撩人
3楼-- · 2019-01-01 08:40

In some brief tests I have found a combination of the two to be better in what I would call reasonably intensive Math:

Type: List<double[]>

Time: 00:00:05.1861300

Type: List<List<double>>

Time: 00:00:05.7941351

Type: double[rows * columns]

Time: 00:00:06.0547118

Running the Code:

int rows = 10000;
int columns = 10000;

IMatrix Matrix = new IMatrix(rows, columns);

Stopwatch stopwatch = new Stopwatch();
stopwatch.Start();


for (int r = 0; r < Matrix.Rows; r++)
    for (int c = 0; c < Matrix.Columns; c++)
        Matrix[r, c] = Math.E;

for (int r = 0; r < Matrix.Rows; r++)
    for (int c = 0; c < Matrix.Columns; c++)
        Matrix[r, c] *= -Math.Log(Math.E);


stopwatch.Stop();
TimeSpan ts = stopwatch.Elapsed;

Console.WriteLine(ts.ToString());

I do wish we had some top notch Hardware Accelerated Matrix Classes like the .NET Team have done with the System.Numerics.Vectors Class!

C# could be the best ML Language with a bit more work in this area!

查看更多
明月照影归
4楼-- · 2019-01-01 08:42

The measurements are nice, but you are going to get significantly different results depending on what you're doing exactly in your inner loop. Measure your own situation. If you're using multi-threading, that alone is a non-trivial activity.

查看更多
浅入江南
5楼-- · 2019-01-01 08:46

if you are just getting a single value out of either (not in a loop) then both do bounds checking (you're in managed code remember) it's just the list does it twice. See the notes later for why this is likely not a big deal.

If you are using your own for(int int i = 0; i < x.[Length/Count];i++) then the key difference is as follows:

  • Array:
    • bounds checking is removed
  • Lists
    • bounds checking is performed

If you are using foreach then the key difference is as follows:

  • Array:
    • no object is allocated to manage the iteration
    • bounds checking is removed
  • List via a variable known to be List.
    • the iteration management variable is stack allocated
    • bounds checking is performed
  • List via a variable known to be IList.
    • the iteration management variable is heap allocated
    • bounds checking is performed also Lists values may not be altered during the foreach whereas the array's can be.

The bounds checking is often no big deal (especially if you are on a cpu with a deep pipeline and branch prediction - the norm for most these days) but only your own profiling can tell you if that is an issue. If you are in parts of your code where you are avoiding heap allocations (good examples are libraries or in hashcode implementations) then ensuring the variable is typed as List not IList will avoid that pitfall. As always profile if it matters.

查看更多
流年柔荑漫光年
6楼-- · 2019-01-01 08:46

Do not attempt to add capacity by increasing the number of elements.

Performance

List For Add: 1ms
Array For Add: 2397ms

    Stopwatch watch;
        #region --> List For Add <--

        List<int> intList = new List<int>();
        watch = Stopwatch.StartNew();
        for (int rpt = 0; rpt < 60000; rpt++)
        {
            intList.Add(rand.Next());
        }
        watch.Stop();
        Console.WriteLine("List For Add: {0}ms", watch.ElapsedMilliseconds);
        #endregion

        #region --> Array For Add <--

        int[] intArray = new int[0];
        watch = Stopwatch.StartNew();
        int sira = 0;
        for (int rpt = 0; rpt < 60000; rpt++)
        {
            sira += 1;
            Array.Resize(ref intArray, intArray.Length + 1);
            intArray[rpt] = rand.Next();

        }
        watch.Stop();
        Console.WriteLine("Array For Add: {0}ms", watch.ElapsedMilliseconds);

        #endregion
查看更多
旧人旧事旧时光
7楼-- · 2019-01-01 08:47

Indeed, if you perform some complex calculations inside the loop, then the performance of the array indexer versus the list indexer may be so marginally small, that eventually, it doesn't matter.

查看更多
登录 后发表回答