Our application uses plenty of dictionaries which have multi level lookup that are not frequently changing. We are investigating at converting some of the critical code that does a lot of lookup using dictionaries to use alternate structures for - faster lookup, light on the memory/gc. This made us compare various dictionaries/libraries available -
Dictionary
(System.Collections.Generics.Dictionary
-SCGD), ImmutableDictionary
, C5.HashDictionary
, FSharpMap
.
Running the following program with various items count - 100, 1000, 10000, 100000 - shows that Dictionary is still the winner at most ranges. First row indicates items in collection. MS/Ticks would the time taken to perform x lookups randomized (code is below).
Items - 100
SCGD - 0 MS - 50 Ticks
C5 - 1 MS - 1767 Ticks
Imm - 4 MS - 5951 Ticks
FS - 0 MS - 240 Ticks
Items - 1000
SCGD - 0 MS - 230 Ticks
C5 - 0 MS - 496 Ticks
Imm - 0 MS - 1046 Ticks
FS - 1 MS - 1870 Ticks
Items - 10000
SCGD - 3 MS - 4722 Ticks
C5 - 4 MS - 6370 Ticks
Imm - 9 MS - 13119 Ticks
FS - 15 MS - 22050 Ticks
Items - 100000
SCGD - 62 MS - 89276 Ticks
C5 - 72 MS - 103658 Ticks
Imm - 172 MS - 246247 Ticks
FS - 249 MS - 356176 Ticks
Does this mean, we are already using the fastest and don't have to change? I had presumed that immutable structures should be at the top of table, but it wasn't that way. Are we doing wrong comparison or am I missing something? Was holding on to this question, but felt, its better to ask. Any link or notes or any references that throws some light will be great.
Complete code for test -
namespace CollectionsTest
{
using System;
using System.Collections.Generic;
using System.Collections.Immutable;
using System.Diagnostics;
using System.Linq;
using System.Text;
using System.Runtime;
using Microsoft.FSharp.Collections;
/// <summary>
///
/// </summary>
class Program
{
static Program()
{
ProfileOptimization.SetProfileRoot(@".\Jit");
ProfileOptimization.StartProfile("Startup.Profile");
}
/// <summary>
/// Mains the specified args.
/// </summary>
/// <param name="args">The args.</param>
static void Main(string[] args)
{
// INIT TEST DATA ------------------------------------------------------------------------------------------------
foreach (int MAXITEMS in new int[] { 100, 1000, 10000, 100000 })
{
Console.WriteLine("\n# - {0}", MAXITEMS);
List<string> accessIndex = new List<string>(MAXITEMS);
List<KeyValuePair<string, object>> listofkvps = new List<KeyValuePair<string, object>>();
List<Tuple<string, object>> listoftuples = new List<Tuple<string, object>>();
for (int i = 0; i < MAXITEMS; i++)
{
listoftuples.Add(new Tuple<string, object>(i.ToString(), i));
listofkvps.Add(new KeyValuePair<string, object>(i.ToString(), i));
accessIndex.Add(i.ToString());
}
// Randomize for lookups
Random r = new Random(Environment.TickCount);
List<string> randomIndexesList = new List<string>(MAXITEMS);
while (accessIndex.Count > 0)
{
int index = r.Next(accessIndex.Count);
string value = accessIndex[index];
accessIndex.RemoveAt(index);
randomIndexesList.Add(value);
}
// Convert to array for best perf
string[] randomIndexes = randomIndexesList.ToArray();
// LOAD ------------------------------------------------------------------------------------------------
// IMMU
ImmutableDictionary<string, object> idictionary = ImmutableDictionary.Create<string, object>(listofkvps);
//Console.WriteLine(idictionary.Count);
// SCGD
Dictionary<string, object> dictionary = new Dictionary<string, object>();
for (int i = 0; i < MAXITEMS; i++)
{
dictionary.Add(i.ToString(), i);
}
//Console.WriteLine(dictionary.Count);
// C5
C5.HashDictionary<string, object> c5dictionary = new C5.HashDictionary<string, object>();
for (int i = 0; i < MAXITEMS; i++)
{
c5dictionary.Add(i.ToString(), i);
}
//Console.WriteLine(c5dictionary.Count);
// how to change to readonly?
// F#
FSharpMap<string, object> fdictionary = new FSharpMap<string, object>(listoftuples);
//Console.WriteLine(fdictionary.Count);
// TEST ------------------------------------------------------------------------------------------------
Stopwatch sw = Stopwatch.StartNew();
for (int index = 0, indexMax = randomIndexes.Length; index < indexMax; index++)
{
string i = randomIndexes[index];
object value;
dictionary.TryGetValue(i, out value);
}
sw.Stop();
Console.WriteLine("SCGD - {0,10} MS - {1,10} Ticks", sw.ElapsedMilliseconds, sw.ElapsedTicks);
Stopwatch c5sw = Stopwatch.StartNew();
for (int index = 0, indexMax = randomIndexes.Length; index < indexMax; index++)
{
string key = randomIndexes[index];
object value;
c5dictionary.Find(ref key, out value);
}
c5sw.Stop();
Console.WriteLine("C5 - {0,10} MS - {1,10} Ticks", c5sw.ElapsedMilliseconds, c5sw.ElapsedTicks);
Stopwatch isw = Stopwatch.StartNew();
for (int index = 0, indexMax = randomIndexes.Length; index < indexMax; index++)
{
string i = randomIndexes[index];
object value;
idictionary.TryGetValue(i, out value);
}
isw.Stop();
Console.WriteLine("Imm - {0,10} MS - {1,10} Ticks", isw.ElapsedMilliseconds, isw.ElapsedTicks);
Stopwatch fsw = Stopwatch.StartNew();
for (int index = 0, indexMax = randomIndexes.Length; index < indexMax; index++)
{
string i = randomIndexes[index];
fdictionary.TryFind(i);
}
fsw.Stop();
Console.WriteLine("FS - {0,10} MS - {1,10} Ticks", fsw.ElapsedMilliseconds, fsw.ElapsedTicks);
}
}
}
}
Your presumption that immutable dictionaries would allow faster lookup is wrong, because the way almost all the immutable collections avoid to copy the whole structure on 'modification' is by storing data in a tree and only copying some of the nodes on 'modification', sharing all other nodes.
I know you are interested in read performance (of frozen dictionaries), but the tree characteristics of immutable collections show similarly in the write performance as documented:
Btw, if you run your sample code as is, the value for at least the immutable dictionary will be highly untypical in a normal (longer running) application, because for some reason the immutable dictionary apparently needs time to warm up. The difference in performance is enormous. Just have a look at the output of the first 3 runs:
Fwiw, I compared the single-thread read performance of
Dictionary<,>
,ConcurrentDictionary<,>
, andImmutableDictionary<,>
based on your code.The average results of 30 runs, after warmup:
To get a feel for write performance, I also ran a test which adds 50 more entries to the dictionaries. Again, the average results of 30 runs, after warmup:
Tested on
N.B. It should be noted that immutable dictionaries are so much faster and/or allow for higher levels of concurrency in many real life multi-threaded applications that otherwise would have to resort to slow or error-prone techniques like defensive copying, locking and the likes, in order to cope with mutability in the face of threads. This is especially true if you need snapshot-abilty such as for optimistic concurrency, MVCC.
The F# map structure is implemented as a binary tree and, as such, is not actually a dictionary. As noted here, a standard .net dictionary is the fastest you're going to get.
Here's some more benchmarks.
Dictionary
vsImmutableDictionary
:Dictionary
vsImmutableSortedDictionary
:I wanted to know how much slower immutable collections would be. Note that the entire run is for 100,000 insert operations. I'm happy to report that the lookup performance degradation is only 4x while the insert performance degradation is 10x, still pretty decent.
ImmutableDictionary
is clearly superior toImmutableSortedDictionary
unless you absolutely need a sorted data structure.Attachments
Program.cs:
The standard dictionary is already quite well optimized. All it really does when you do a lookup is calculate the hash of the provided key (the speed of which depends on the type of the key and how it implements
GetHashCode
), a modulo operation on the hash value to find the right bucket and then it iterates through the contents of the bucket until it finds the right value (the speed of which depends on the quality of theGetHashCode
function, so if the buckets are well balanced and don't contain too many items, and the speed of theEquals
method for the type).All in all, it doesn't do that much for lookups, so I don't think you'll be able to find a significantly faster generic data structure. However, it's possible that depending on how you plan to use the dictionaries, you're able to build a more specialized solution. For example, I needed a really fast lookup where the key was a type. Instead of using a dictionary and doing
dictionary[typeof(T)]
, I made a generic class like so:So I could just do
ValueStore<T>.Value
with pretty much zero lookup overhead.Whether or not you could do something similar (and whether it's worth it) really depends on your usecase; how many items the structure would hold, how often it's being read and written to, whether it needs to be threadsafe, how important write speed is, etc. For example, if write speed didn't matter at all but if thread safety was required, you would need to do a copy-on-write, where the data structure is never written to but instead copied, ensuring thread safety and lockless (thus: fast) reads, at the cost of write speeds. Specializing it could go as far as reordering the structure on writes to optimize it so the hash buckets don't contain more than N items.
PS: if you were really desperate for speed but couldn't build a more specialized data structure then you could perhaps get small gains from copying
Dictionary<TKey,TValue>
and removing the various sanity checks (null checks and such) and virtual/interface method calls. However, I doubt this would give you any more than 20% gain, if that.