VBA: Iteration speed of variant array vs. typed ar

2019-02-06 15:38发布

问题:

My project requires a bunch of dynamically-resizable arrays for different objects. An array may hold any number of objects, potentially thousands, of a single class, but not objects of multiple classes.

Mostly I will be iterating through arrays, thus use of a keyed collection is not ideal. I think I have two options:

The first option is to develop a 'List' class for each object type, with methods for adding objects (and expanding the array), getting the First and Last indexes and the object count, and retrieving an object by index (the latter 4 will include error handling in case the array is empty).

The second option is to develop a single 'List' class, with the same methods, using the Variant data type. Obviously this is a lot less work, but I am concerned about speed. How much slower is it to use variants than typed objects? Note that I will always be casting the variant objects in the array directly to a typed variable upon retrieval, a la:

Dim myObject As MyClass
Set myObject = variantList.Get(i)

Does casting improve speed, or does vba still have to perform all the type-checking associated with variants?

Also, would this second option be faster than using a non-keyed Collection? I have read that Collection iteration is slow, that they are designed for lookup. Does this apply to non-keyed collections, or only key-value-mapped collections?

Thanks to anyone who can offer advice.

回答1:

I followed Tim Williams's advice and did some speed tests.

For each type of collection/array, I first added 100,000 objects of class "SpeedTester", which was simply a shell object holding a long variable (with get/set properties). The value of the variable was the value of the loop index (between 1 and 100,000)

Then I did a second loop, which involved accessing each object in the collection/array and assigning the object's long property value to a new variable of type long. I performed 3 rounds per method, and averaged the times for the And and get loops.

The results are as follows:

Method                      Avg Add Time    Avg Get Time    Total Time
Collection Indexed             0.305          25.498         25.803
Collection Mapped              1.021           0.320          1.342
Collection Indexed For Each    0.334           0.033          0.367
Collection Mapped For Each     1.084           0.039          1.123
Dynamic Array Typed            0.303           0.039          0.342
Static Array Typed             0.251           0.016          0.266

The methods Collection Indexed and Collection Mapped involved holding the objects in a collection. The first were added with no key, the second was added with a key which was the object's long property converted into a string. These objects were then accessed in a for-loop using an index from 1 to c.Count

The next two methods were identical to the first two in the way variables were added to the collection. However, for the Get loop, instead of using a for-loop with an index, I used a for-each loop.

Dynamic array typed was a custom class containing an array of type SpeedTester. Each time a variable is added, the size of the array was expanded by 1 slot (using ReDim Preserve). The get-loop was a for-loop using an index from 1 to 100,000, as is typical for an array.

Finally the static array typed was simply an array of type SpeedTester, which was initialised with 100,000 slots. Obviously this is the fastest method. Strangely enough, much of its speed gains were in Getting rather than Adding. I would have assumed that adding would be slower for the other methods, due to the need for resizing, while Getting each object would be no faster than a dynamic array.

I was astounded by the difference between using a for-loop and a for-each loop to access an indexed collection's objects. I was also suprised by the mapped collection's key lookup speed - much, much faster than indexing and comparable to all other methods except the static array.

In short, they're all viable alternatives for my project (except for the 1st and last methods, first because of its slowness, last because I need dynamically resizable arrays). I know absolutely nothing about how the collections are actually implemented, or the implementation differences between a dynamic and static array. Any further insight would be much appreciated.

EDIT: The code for the test itself (using the dynamic array)

Public Sub TestSpeed()
    Dim ts As Double
    ts = Timer()

    Dim c As TesterList
    Set c = New TesterList

    Dim aTester As SpeedTester

    Dim i As Long
    For i = 1 To 100000
        Set aTester = New SpeedTester
        aTester.Number = i

        Call c.Add(aTester)
    Next i

    Dim taa As Double
    taa = Timer()

    For i = c.FirstIndex To c.LastIndex
        Set aTester = c.Item(i)

        Dim n As Long
        n = aTester.Number
    Next i

    Dim tag As Double
    tag = Timer()

    MsgBox "Time to add: " & (taa - ts) & vbNewLine & "Time to get: " & (tag - taa)
End Sub

And for the dynamic array class TesterList:

Private fTesters() As SpeedTester

Public Property Get FirstIndex() As Long
    On Error GoTo Leave

    FirstIndex = LBound(fTesters)

Leave:
    On Error GoTo 0
End Property

Public Property Get LastIndex() As Long
    On Error GoTo Leave

    LastIndex = UBound(fTesters)

Leave:
    On Error GoTo 0
End Property

Public Sub Add(pTester As SpeedTester)
    On Error Resume Next

    ReDim Preserve fTesters(1 To UBound(fTesters) + 1) As SpeedTester
    If Err.Number <> 0 Then
        ReDim fTesters(1 To 1) As SpeedTester
    End If

    Set fTesters(UBound(fTesters)) = pTester

    On Error GoTo 0
End Sub

Public Function Item(i As Long) As SpeedTester
    On Error GoTo Leave

    Set Item = fTesters(i)

Leave:
    On Error GoTo 0
End Function

And finally, the very simple SpeedTester object class:

Private fNumber As Long

Public Property Get Number() As Long
    Number = fNumber
End Property

Public Property Let Number(pNumber As Long)
    fNumber = pNumber
End Property