So, I've got a word analyzing program in Excel with which I hope to be able to import over 30 million words.
At first,I created a separate object for each of these words so that each word has a...
.value '(string), the actual word itself
.bool1 '(boolean)
.bool2 '(boolean)
.bool3 '(boolean)
.isUsed '(boolean)
.cancel '(boolean)
When I found out I may have 30 million of these objects (all stored in a single collection), I thought that this could be a monster to compile. And so I decided that all my words would be strings, and that I would stick them into an array.
So my array idea is to append each of the 30 million strings by adding 5 spaces (for my 5 bools) at the beginning of each string, with each empty space representing a false bool val. e.g,
If instr(3, arr(n), " ") = 1 then
'my 3rd bool val is false.
Elseif instr(3, arr(n), "*") = 1 then '(I'll insert a '*' to denote true)
'my third bool val is true.
End If
Anyway, what do you guys think? Which way (collection or array) should I go about this (for optimization specifically)?
(I wanted to make this a comment but it became too long)
An answer would depend on how you want to access and process the words, once stored.
There are significant benefits and distinct advantages for 3 candidates:
Arrays are very efficient to populate and retrieve all items at once (ex. range to array and array back to range), but much slower at re-sizing and inserting items in the middle. Each Redim copies the entire memory block to a larger location, and if Preserve is used, all values copied over as well. This may translate to perceived slowness for every operation (in a potential application)
- More details (arrays vs collections) here (VB specific but it applies to VBA as well)
Collections are linked lists with hash-tables - quite slow to populate but after that you get instant access to any element in the collection, and just as fast at reordering (sorting) and re-sizing. This can translate into a slow opening file, but all other operations are instant. Other aspects:
Dictionaries: same as collections but with the extra benefit of the .Exists() method which, in some scenarios, makes them much faster than collections. Other aspects:
Other links: optimizing loops and optimizing strings (same site)