What is the memory layout of a .NET array?
Take for instance this array:
Int32[] x = new Int32[10];
I understand that the bulk of the array is like this:
0000111122223333444455556666777788889999
Where each character is one byte, and the digits corresponds to indices into the array.
Additionally, I know that there is a type reference, and a syncblock-index for all objects, so the above can be adjusted to this:
ttttssss0000111122223333444455556666777788889999
^
+- object reference points here
Additionally, the length of the array needs to be stored, so perhaps this is more correct:
ttttssssllll0000111122223333444455556666777788889999
^
+- object reference points here
Is this complete? Are there more data in an array?
The reason I'm asking is that we're trying to estimate how much memory a couple of different in-memory representations of a rather large data corpus will take and the size of the arrays varies quite a bit, so the overhead might have a large impact in one solution, but perhaps not so much in the other.
So basically, for an array, how much overhead is there, that is basically my question.
And before the arrays are bad squad wakes up, this part of the solution is a static build-once-reference-often type of thing so using growable lists is not necessary here.
Great question. I found this article which contains block diagrams for both value types and reference types. Also see this article in which Ritcher states:
An array object would have to store how many dimensions it has and the length of each dimension. So there is at least one more data element to add to your model
One way to examine this is to look at the code in WinDbg. So given the code below, let's see how that appears on the heap.
The first thing to do is to locate the instance. As I have made this a local in
Main()
, it is easy to find the address of the instance.From the address we can dump the actual instance, which gives us:
This tells us that it is our Int32 array with 10 elements and a total size of 52 bytes.
Let's dump the memory where the instance is located.
I have inserted brackets for the 52 bytes.
Edit: Forgot length in first posting.
The listing is slightly incorrect because as romkyns points out the instance actually begins at the address - 4 and the first field is the Syncblock.
Great question! I wanted to see it for myself, and it seemed a good opportunity to try out CorDbg.exe...
It seems that for simple integer arrays, the format is:
where s is the sync block, l the length of the array, and then the individual elements. It seems that there is a finally 0 at the end, I'm not sure why that is.
For multidimensional arrays:
where s is the sync block, t the total number of elements, l1 the length of the first dimension, l2 the length of the second dimension, then two zeroes?, followed by all the elements sequentially, and finally a zero again.
Object arrays are treated as the integer array, the contents are references this time. Jagged arrays are object arrays where the references point to other arrays.