Garbage Collection

2019-03-13 13:22发布

问题:

I am not able to understand few things on the Garbage collection.

Firstly, how is data allocated space ? i.e. on stack or heap( As per my knowledge, all static or global variables are assigned space on stack and local variables are assigned space on heap).

Second, GC runs on data on stacks or heaps ? i.e a GC algorithm like Mark/Sweep would refer to data on stack as root set right? And then map all the reachable variables on heap by checking which variables on heap refer to the root set.

What if a program does not have a global variable? How does the algorithm work then?

Regards, darkie

回答1:

It might help to clarify what platform's GC you are asking about - JVM, CLR, Lisp, etc. That said:

First to take a step back, certain local variables of are generally allocated on the stack. The specifics can vary by language, however. To take C# as an example, only local Value Types and method parameters are stored on the stack. So, in C#, foo would be allocated on the stack:

public function bar() { 
    int foo = 2;
    ...
}

Alternatively, dynamically-allocated variables use memory from the heap. This should intuitively make sense, as otherwise the stack would have to grow dynamically each time a new is called. Also, it would mean that such variables could only be used as locals within the local function that allocated them, which is of course not true because we can have (for example) class member variables. So to take another example from C#, in the following case result is allocated on the heap:

public class MyInt
{         
    public int MyValue;
}

...
MyInt result = new MyInt();
result.MyValue = foo + 40;
...

Now with that background in mind, memory on the heap is garbage-collected. Memory on the stack has no need for GC as the memory will be reclaimed when the current function returns. At a high level, a GC algorithm works by keeping track of all objects that are dynamically allocated on the heap. Once allocated via new, the object will be tracked by GC, and collected when it is no longer in scope and there are no more references to it.



回答2:

Check out the book Garbage Collection: algorithms for automatic dynamic memory management.



回答3:

Firstly, how is data allocated space ? i.e. on stack or heap( As per my knowledge, all static or global variables are assigned space on stack and local variables are assigned space on heap).

No, stack variables are method calls and local variables. A stack frame is created when the method is called and popped off when it returns.

Memory in Java and C# is allocated on the heap by calling "new".

Second, GC runs on data on stacks or heaps ? i.e a GC algorithm like Mark/Sweep would refer to data on stack as root set right? And then map all the reachable variables on heap by checking which variables on heap refer to the root set.

GC is used on the heap.

Mark and sweep would not be considered a cutting edge GC algorithm. Both Java and .NET GC use generational models now.

What if a program does not have a global variable? How does the algorithm work then?

What does "global variable" mean in languages like Java and C# where everything belongs to a class?

The root of the object graph is arbitrary. I'll admit that I don't know how it's chosen.



回答4:

Read this article. It is a very good survey on uniprocessor garbage collection techniques. It will give you the basic understanding and terminology on GC. Then, follow up with the Jones and Lins book "Garbage Collection: Algorithms for Automatic Dynamic Memory Management". Contrary to the survey article I point to above, the book is not available for free on the Web; you have to buy it; but it is worth it.



回答5:

Richard and Carl have a very nice show on the Windows Memory Model, including the .NET model and GC, in their .NET Rocks! archives:

Jeffrey Richter on the Windows Memory Model



回答6:

You might find the short summary of Garbage Collection on the Memory Management Reference useful.

Ultimately, garbage collection has to start at the registers of the processor(s), since any objects that can't be reached by the processor can be recycled. Depending the the language and run-time system, it makes sense to assume statically that the stacks and registers of threads are also reachable, as well as “global variables”.

Stacks probably get you local variables. So in simple GCs you start by scanning thread contexts, their stacks, and the global variables. But that's certainly not true in every case. Some languages don't use stacks or have global variables as such. What's more, GCs can use a barrier so that they don't have to look at every stack or global every time. Some specialised hardware, such as the Symbolics Lisp Machine had barriers on registers!