Hiding vars in strings VS using objects with prope

2020-07-23 08:38发布

问题:

So, I've got a word analyzing program in Excel with which I hope to be able to import over 30 million words.

At first,I created a separate object for each of these words so that each word has a...

.value '(string), the actual word itself
.bool1 '(boolean) 
.bool2 '(boolean)
.bool3 '(boolean)
.isUsed '(boolean)
.cancel '(boolean)

When I found out I may have 30 million of these objects (all stored in a single collection), I thought that this could be a monster to compile. And so I decided that all my words would be strings, and that I would stick them into an array.

So my array idea is to append each of the 30 million strings by adding 5 spaces (for my 5 bools) at the beginning of each string, with each empty space representing a false bool val. e.g,

If instr(3, arr(n), " ") = 1 then   
'my 3rd bool val is false. 
Elseif instr(3, arr(n), "*") = 1 then '(I'll insert a '*' to denote true) 
'my third bool val is true.
End If

Anyway, what do you guys think? Which way (collection or array) should I go about this (for optimization specifically)?

回答1:

(I wanted to make this a comment but it became too long)

An answer would depend on how you want to access and process the words, once stored.

There are significant benefits and distinct advantages for 3 candidates:

  1. Arrays are very efficient to populate and retrieve all items at once (ex. range to array and array back to range), but much slower at re-sizing and inserting items in the middle. Each Redim copies the entire memory block to a larger location, and if Preserve is used, all values copied over as well. This may translate to perceived slowness for every operation (in a potential application)

    • More details (arrays vs collections) here (VB specific but it applies to VBA as well)
  2. Collections are linked lists with hash-tables - quite slow to populate but after that you get instant access to any element in the collection, and just as fast at reordering (sorting) and re-sizing. This can translate into a slow opening file, but all other operations are instant. Other aspects:

    • Retrieve keys as well as the items associated with those keys
    • Handle case-sensitive keys
    • Items can be other collections, arrays, objects
    • While keys must be unique, they are also optional
    • An item can be returned in reference to its key, or in reference to its index value
    • Keys are always strings, and always case insensitive
    • Items are accessible and retrievable, but its keys are not
    • Cannot remove all items at once (either one by one, or destroy then recreate the Collection
    • Enumerating with For...Each...Next, lists all items

    • More info here and here

  3. Dictionaries: same as collections but with the extra benefit of the .Exists() method which, in some scenarios, makes them much faster than collections. Other aspects:

    • Keys are mandatory and always unique to that Dictionary
    • An item can only be returned in reference to its key
    • The key can take any data type; for string keys, by default a Dictionary is case sensitive
    • Exists() method to test for the existence of a particular key (and item)

      • Collections have no similar test; instead, you must attempt to retrieve a value from the Collection, and handle the resulting error if the key is not found
    • Items AND keys are always accessible and retrievable to the developer
    • Item property is read/write, so it allows changing the item associated with a particular key
    • Allows you to remove all items in a single step without destroying the Dictionary itself
    • Using For...Each...Next dictionaries will enumerate the keys
    • A Dictionary supports implicit adding of an item using the Item property.

      • In Collections, items must be added explicitly
    • More details here


Other links: optimizing loops and optimizing strings (same site)