If I have, say, 100 items that'll be stored in a dictionary, should I initialise it thus?
var myDictionary = new Dictionary<Key, Value>(100);
My understanding is that the .NET dictionary internally resizes itself when it reaches a given loading, and that the loading threshold is defined as a ratio of the capacity.
That would suggest that if 100 items were added to the above dictionary, then it would resize itself when one of the items was added. Resizing a dictionary is something I'd like to avoid as it has a performance hit and is wasteful of memory.
The probability of hashing collisions is proportional to the loading in a dictionary. Therefore, even if the dictionary does not resize itself (and uses all of its slots) then the performance must degrade due to these collisions.
How should one best decide what capacity to initialise the dictionary to, assuming you know how many items will be inside the dictionary?
Yes, contrary to a
HashTable
which uses rehashing as the method to resolve collisions,Dictionary
will use chaining. So yes, it's good to use the count. For aHashTable
you probably want to usecount * (1/fillfactor)
I did a quick test, probably not scientific, but if I set the size it took 1.2207780 seconds to add one million items and it took 1.5024960 seconds to add if I didn't give the Dictionary a size... this seems negligible to me.
Here is my test code, maybe someone can do a more rigorous test but I doubt it matters.
What you should initialize the dictionary capacity to depends on two factors: (1) The distribution of the gethashcode function, and (2) How many items you have to insert.
Your hash function should either be randomly distributed, or it is should be specially formulated for your set of input. Let's assume the first, but if you are interested in the second look up perfect hash functions.
If you have 100 items to insert into the dictionary, a randomly distributed hash function, and you set the capacity to 100, then when you insert the ith item into the hash table you have a (i-1) / 100 probability that the ith item will collide with another item upon insertion. If you want to lower this probability of collision, increase the capacity. Doubling the expected capacity halves the chance of collision.
Furthermore, if you know how frequently you are going to be accessing each item in the dictionary you may want to insert the items in order of decreasing frequency since the items that you insert first will be on average faster to access.
The initial size is just a suggestion. For example, most hash tables like to have sizes that are prime numbers or a power of 2.
I think you're over-complicating matters. If you know how many items will be in your dictionary, then by all means specify that on construction. This will help the dictionary to allocate the necessary space in its internal data structures to avoid reallocating and reshuffling data.
Specifying the initial capacity to the
Dictionary
constructor increases performance because there will be fewer number of resizes to the internal structures that store the dictionary values during ADD operations.Considering that you specify a initial capacity of k to the
Dictionary
constructor then:Dictionary
will reserve the amount of memory necessary to store k elements;From MSDN: