Generics memory management

2020-03-01 19:34发布

站内文章 / C#

13 0

啃猪蹄的小仙女

女 | 书童

私信

可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

I have question regarding how memory is managed for strong type Generics

List<int> ints1 = new List<int>();
ints1.Add(1); ints1.Add(2); ints1.Add(3);

int[] ints2 = new int[10]();
ints2.Add(6); ints2.Add(7); ints2.Add(8);

Question 1: I presume as ints1 is initialised with a new keyword (new List<int>()) it becomes a reference type. Where are the values 1,2,3 are stored in memory (are they stored in stack or on heap)?

Question 2: I know of a thing between List<int> and int[], List<int> can scale its size to any size at run time, which is restricted in int[] at compile time. So, if the values 1,2,3 and stored in stack, if a new item is added to List<int> say 4, it wouldn't be continuous to the first 3 right, so how will ints1 know the memory location of 4?

回答1:

I presume as ints1 is initialised with a new keyword new List<int>() it becomes a reference type

This presumption is incorrect. You can use the "new" keyword on a value type too!

int x = new int();

Using "new" does not make anything a reference type. You can use "new" with reference types or value types. What "new" indicates is that storage is going to be allocated and a constructor is going to be called.

In the case of using "new" on a value type, the allocated storage is temporary storage. A reference to that temporary storage is passed to the constructor, and then the now-initialized result is copied to its final destination, if there is one. ("new" is usually used with an assignment but it need not be.)

In the case of a reference type, storage is allocated twice: long-term storage is allocated for the instance and short-term storage is allocated for the reference to long-term storage of the instance. The reference is passed to the constructor, which initializes the long-term storage. The reference is then copied from short-term storage to its final destination, if there is one.

What makes List<int> a reference type is that List<T> is declared as a class.

Where are the values 1,2,3 are stored in memory (are they stored in stack or on heap)?

We've worked hard to make a memory manager that lets you not care where things are stored. Values are stored in either a short-term memory pool (implemented as the stack or registers) or a long-term memory pool (implemented as a garbage-collected heap). Storage is allocated depending on the known lifetime of the value. If the value is known to be short-lived then its storage is allocated on the short-term pool. If the value is not known to be short-lived then it must be allocated on the long-term pool.

The 1, 2, 3 owned by the list could live forever; we do not know whether that list is going to outlive the current activation frame or not. Therefore the memory to store the 1, 2, 3 is allocated on the long-term pool.

Do not believe the lie that "value types are always allocated on the stack". Obviously that cannot be true because then a class or array containing a number could not survive the current stack frame! Value types are allocated on the pool that makes sense for their known lifetime.

List<int> can scale its size to any size at runtime unlike int[]

Correct. It is educational to see how List<T> does that. It simply allocates an array of T larger than it needs. If it discovers that it guessed too small, it allocates a new, larger array and copies the old array contents to the new one. A List<T> is just a convenient wrapper around a bunch of array copies!

if the values 1,2,3 were stored in stack, and a new item 4 is added to the list, then it wouldn't be continuous to the first three.

Correct. That's one reason why the storage for values 1, 2, 3 are not allocated on the stack. The storage is actually an array allocated on the heap.

so how will the list know the memory location of item 4?

The list allocates an array that is too big. When you add a new item, it sticks it into unused space in the too-big array. When the array runs out of room, it allocates a new array.

回答2:

The "new" syntax is used to initialize both value-types and reference-types. The new list is created on the heap; the values are loaded on the stack (i.e. before they are added to the list), but once added, they are on the heap, in the int[] that underpins the list. Arrays are always on the heap.

The fact that they are copied to the array also answers part 2 I believe. The array is over-sized, and reallocated only when full.

Note; List<int> doesn't "become" a reference-type; it is always a reference-type.

回答3:

Memory management for generics (Generic collections) is exactly the same as for non-generic types.

Your ints1 list uses an array under the covers. So it is the same as for ints2 (when it has been corrected). In both cases a block of memory on the Heap is holding the int values.

The List<> class consists of an array, a int Count and an int Capacity property. When you Add() an element Count is incremented, when it passes Capacity a new array is allocated and the contents are copied.

回答4:

Question 1: http://msdn.microsoft.com/en-us/library/6sh2ey19.aspx says:

The List class is the generic equivalent of the ArrayList class. It implements the IList generic interface using an array whose size is dynamically increased as required.

This looks like a simple array, that is just reallocated if it overflows. AFAIKR the size is doubled on every reallocation - I researched that once, but can't remember what for.

The array is allocated on the managed heap, just as it would if you just declared it.

回答5:

List is a reference type no matter how you see it. All these types are allocated on the heap. I do not know whether the C# compiler is clever enough yet to figure out that an object which is not used outside of a method can be allocated on the stack, (Eric Lippert might be able to tell us,) but even if it does, that's something that you, as a programmer, do not need to worry about. It will just be an optimization that the compiler will do for you, without you ever noticing.
An array of int is also a reference type and it is also allocated on the heap, it is just as simple as that. There is no point in wondering about some hypothetical fragmentation of arrays in the stack, because they are simply not allocated in the stack.

回答6:

List<int> ints1 = new List<int>();
I presume as ints1 is initialised with a new keyword (new List()) it becomes a reference type.

Let's clear up on some terminology first:

ints1 is not a type of any sort, but a variable, so it cannot "become a reference type", ever.
Variables have a static ("compile-time") type. ints1, for example, was declared to be of type List<int>. The static type of a variable never changes.
The type of the value or object that a variable refers to is called its dynamic ("run-time") type. After the above assignment, the dynamic type of ints1 is List<int>. The dynamic type of a variable may change whenever a new value or object is assigned to it.

In the case of ints1, both types happen to be equal, but this isn't always so:

ICollection<int> ints3 = new List<int>();

Here, the static type is ICollection<int> (always), and the dynamic type List<int> (after the assignment).

Answer to Question 1:

ints1.Add(1); ints1.Add(2); ints1.Add(3);
Where are the values 1,2,3 are stored in memory (are they stored in stack or on heap)?

The official answer is that this should be considered an implementation detail of List<T> and should not matter to you. All you need to know about is the operations you can perform on a List<T> (such as Add, Clear, etc.) and their characteristics (such as pre-/post-conditions, performance, etc.).

If you still want to know how this type works internally, let's start with this hint:

The List<T> class […] implements the IList<T> generic interface using an array whose size is dynamically increased as required. — MSDN reference page for List<T>, section Remarks

Meaning, you can imagine a List<T> to be an array of type T[] that grows its capacity when needed. In the case of ints1, think of an int[]. Because int is a value type, the concrete values in the list (1, 2, 3) will be stored in-place. If it were a reference type, the values would each be stored in a separate location and the array would only contain references.

Note that I haven't mentioned the terms "stack", nor "heap" in the above paragraph. This is because these concepts are another implementation detail, namely one of the .NET platform, that you're not supposed to care too much about. (See Eric Lippert's blog article series, "The stack is an implementation detail", and e.g. my answer to a previous question, "Does new always allocate on the heap in C++ / C# / Java?")

Answer to Question 2:

List can scale its size to any size at run time, which is restricted in int[] at compile time. So, if the values 1,2,3 and stored in stack, if a new item is added to List say 4, it wouldn't be continuous to the first 3 right, so how will ints1 know the memory location of 4?

Again, you're not required to think about that. What matters about List<int> is whether its particular characteristics and operations that it offers suit your purposes. But I wouldn't really have answered your question if I left it at that, so let's briefly think about this anyway:

When you talk about the "stack", you probably mean an executing thread's call stack. AFAIK, virtual machines and programming languages that make use of such a data structure mostly use it for passing arguments to functions/methods, and perhaps also for storing values that stay local to a function. This is not the same as saying that the call stack also gets used for local variables, because a local variable may contain an object that is returned to the calling function, either via the return value, or via a ref/out parameter, etc.

To me, it seems unlikely that a List<int> object would ever end up on a call stack under normal circumstances (I'm not considering stackalloc or anything else related to unsafe contexts btw.). Remember that List<T> is a reference type: While it's possible, and perhaps quite likely, that the reference to the actual object ends up on the call stack, most likely the object's data itself won't.

A (perhaps naïve) implementation of IList<T> that uses fixed-size arrays for item storage, faced with the need to increase its capacity, could dynamically allocate a new, larger array, then copy the contents of the current array over to the new array, then abandon the old array in favor of the new one. That way, all items remain together in one place.

回答7:

Question 1

Lists in C# internally contain arrays. List refers to a location on the heap, and in that location is an array storing all the values. So the values are stored on the heap. Same goes for arrays if it's part of a class.

Question 2

The stack is continuous, so pushing another int on the stack will mean that it's memory address is the location of the previous int + 4. The way Lists work when you adding items is that they create an array larger than what you need. When you reach the length of the array, there's an algorithm that creates a larger array and copies the current values over.

Another thing that may interest you are linked lists. Linked lists don't work with arrays internally, instead they work with nodes. Each node contains data and the location of the next node in the list. A doubly linked list contains nodes with all that and the location of the previous node in the list.