I am testing how big a collection could be in .Net. Technically, any collection object could grows to the size of the physical memory.
Then I tested the following code in a sever, which has 16GB memory, running Windows 2003 server and Visual Studio 2008. I tested both F# and C# code, and looked at the Task Manager while running. I can see that after about growing 2GB memory, the program crashed with out-of-memory exception. I did set the target platform to x64 in the property page.
open System.Collections.Generic
let d = new Dictionary<int, int>()
for i=1 to 1000000000 do
d.Add(i,i)
I did a same test to the C5 collection library. The result is that the dictionary in C5 could use up the whole memory. The code uses C5:
let d = C5.HashDictionary<int, int> ()
for i=1 to 1000000000 do
d.Add(i,i)
Anyone knows why?
The Microsoft CLR has a 2GB maximum object size limit, even the 64 bit version. (I'm not sure whether this limit is also present in other implementations such as Mono.)
The limitation applies to each single object -- not the total size of all objects -- which means that it's relatively easy to workaround using a composite collection of some sort.
There's a discussion and some example code here...
- BigArray<T>, getting around the 2GB array size limit
There seems to be very little official documentation that refers to this limit. It is, after all, just an implementation detail of the current CLR. The only mention that I'm aware of is on this page:
When you run a 64-bit managed
application on a 64-bit Windows
operating system, you can create an
object of no more than 2 gigabytes
(GB).
In versions of .NET prior to 4.5, the maximum object size is 2GB. From 4.5 onwards you can allocate larger objects if gcAllowVeryLargeObjects is enabled. Note that the limit for string
is not affected, but "arrays" should cover "lists" too, since lists are backed by arrays.
And to be clear, a Dictionary uses a single array to add the pairs. It is grown (doubled?) each time it is full. When there are 512 million objects, its size is 2GByte (with a 32 bit object pointer, and assuming perfect distribution). Adding one more element makes the Dictionary try to double the array size again. Boom.
The C5 HashDictionary uses linear hashing, and probably uses an array of buckets each containing multiple (16?) elements. It should run into the same problem (much) later.
The "allow large objects" will only help to get rid of OOM exception.
When one needs to store very many objects the problem that you will see is
GC stalls(pauses). What we have done is "hiding" of data from GC, which turned into
a very practical solution.
See this: https://www.infoq.com/articles/Big-Memory-Part-3
You can use cache that works as a dictionary:
https://github.com/aumcode/nfx/tree/master/Source/NFX/ApplicationModel/Pile
see the caching section