Performance alternative over Scripting.Dictionary

I am coding a Manager in Excel-VBA with several buttons.

One of them is to generate a tab using another Excel file (let me call it T) as input.

Some properties of T:

~90MB size
~350K lines
Contains sales data of the last 14 months (unordered).
Relevant columns:
- year/month
- total-money
- seller-name
- family-product
- client-name
There is not id columns (like: cod-client, cod-vendor, etc.)

Main relation:

Sellers sells many Products to many Clients

I am generating a new Excel tab with data from T of the last year/month grouped by Seller.

Important notes:

T is the only available input/source.
If two or more Sellers sells the same Product to the same Client, the total-money should be counted to all of those Sellers.

This is enough, now you know what I have already coded.

My code works, but, it takes about 4 minutes of runtime.

I have already coded some other buttons using smaller sources (not greater than 2MB) which runs in 5 seconds.

Considering T size, 4 minutes runtime could be acceptable.

But I'm not proud of it, at least not yet.

My code is mainly based on Scripting.Dictionary to map data from T, and then I use for each key in obj ... next key to set the grouped data to the new created tab.

I'm not sure, but here are my thoughts:

If N is the total keys in a Scripting.Dictionary, and I need to check for obj.Exists(str) before aggregating total-money. It will run N string compares to return false.
Similarly it will run maximun N string compares when I do Set seller = obj(seller_name).

I want to be wrong with my thoughts. But if I'm not wrong, my next step (and last hope) to reduce the runtime of this function is to code my own class object with Tries.

I will only start coding tomorrow, what I want is just some confirmation if I am in the right way, or some advices if I am in the wrong way of doing it.

Do you have any suggestions? Thanks in advance.

标签： excel-vba dictionary vba excel

1条回答

相关推荐>>

2楼-- · 2019-08-04 16:48

Memory Limit Exceeded

In short:

The main problem was because I used a dynamic programming approach of storing information (preprocessing) to make the execution time faster.
My code now runs in ~ 13 seconds.

There are things we learn the hard way. But I'm glad I found the answer.

Using the Task Manager I was able to see my code reaching 100% memory usage.
The DP approach I mentioned above using Scripting.Dictionary reached 100% really faster.
The DP approach I mentioned above using my own cls_trie implementation also reached 100%, but later than the first.
This explains the ~4-5 min compared to ~2-3 min total runtime of above attempts.
In the Task Manager I could also see that the CPU usage never hited 2%.

Solution was simple, I had to balance CPU and Memory usages.

I changed some DP approaches to simple for-loops with if-conditions.
The CPU usage now hits ~15%.
The Memory usage now hits ~65%.
I know this is relative to the CPU and Memory capacity of each machine. But in the client machine it is also running in no more than 15 seconds now.

I created one GitHub repository with my cls_trie implementation and added one excel file with an example usage.

I'm new to the excel-vba world (4 months working with it right now). There might probably have some ways to improve my cls_trie implementation, I'm openned to suggestions:

Option Explicit

Public Keys As Collection
Public Children As Variant
Public IsLeaf As Boolean

Public tObject As Variant
Public tValue As Variant

Public Sub Init()
    Set Keys = New Collection
    ReDim Children(0 To 255) As cls_trie
    IsLeaf = False

    Set tObject = Nothing
    tValue = 0
End Sub

Public Function GetNodeAt(index As Integer) As cls_trie
    Set GetNodeAt = Children(index)
End Function

Public Sub CreateNodeAt(index As Integer)
    Set Children(index) = New cls_trie
    Children(index).Init
End Sub

'''
'Following function will retrieve node for a given key,
'creating a entire new branch if necessary
'''
Public Function GetNode(ByRef key As Variant) As cls_trie
    Dim node As cls_trie
    Dim b() As Byte
    Dim i As Integer
    Dim pos As Integer

    b = CStr(key)
    Set node = Me

    For i = 0 To UBound(b) Step 2
        pos = b(i) Mod 256

        If (node.GetNodeAt(pos) Is Nothing) Then
            node.CreateNodeAt pos
        End If

        Set node = node.GetNodeAt(pos)
    Next

    If (node.IsLeaf) Then
        'already existed
    Else
        node.IsLeaf = True
        Keys.Add key
    End If

    Set GetNode = node
End Function

'''
'Following function will get the value for a given key
'Creating the key if necessary
'''
Public Function GetValue(ByRef key As Variant) As Variant
    Dim node As cls_trie
    Set node = GetNode(key)
    GetValue = node.tValue
End Function

'''
'Following sub will set a value to a given key
'Creating the key if necessary
'''
Public Sub SetValue(ByRef key As Variant, value As Variant)
    Dim node As cls_trie
    Set node = GetNode(key)
    node.tValue = value
End Sub

'''
'Following sub will sum up a value for a given key
'Creating the key if necessary
'''
Public Sub SumValue(ByRef key As Variant, value As Variant)
    Dim node As cls_trie
    Set node = GetNode(key)
    node.tValue = node.tValue + value
End Sub

'''
'Following function will validate if given key exists
'''
Public Function Exists(ByRef key As Variant) As Boolean
    Dim node As cls_trie
    Dim b() As Byte
    Dim i As Integer

    b = CStr(key)
    Set node = Me

    For i = 0 To UBound(b) Step 2
        Set node = node.GetNodeAt(b(i) Mod 256)

        If (node Is Nothing) Then
            Exists = False
            Exit Function
        End If
    Next

    Exists = node.IsLeaf
End Function

'''
'Following function will get another Trie from given key
'Creating both key and trie if necessary
'''
Public Function GetTrie(ByRef key As Variant) As cls_trie
    Dim node As cls_trie
    Set node = GetNode(key)

    If (node.tObject Is Nothing) Then
        Set node.tObject = New cls_trie
        node.tObject.Init
    End If

    Set GetTrie = node.tObject
End Function

You can see in the above code:

I hadn't implemented any delete method because I didn't need it till now. But it would be easy to implement.
I limited myself to 256 children because in this project the text I'm working on is basically lowercase and uppercase [a-z] letters and numbers, and the probability that two text get mapped to the same branch node tends zero.

as a great coder said, everyone likes his own code even if other's code is too beautiful to be disliked ^[1]

My conclusion

I will probably never more use Scripting.Dictionary, even if it is proven that somehow it could be better than my cls_trie implementation.

Thank you all for the help.

0人赞添加讨论(0) 举报

Performance alternative over Scripting.Dictionary

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间