How are closures and scopes represented at run tim

2020-01-26 05:14发布

问题:

This is mostly an out-of-curiosity question. Consider the following functions

var closure ;
function f0() {
    var x = new BigObject() ;
    var y = 0 ;
    closure = function(){ return 7; } ;
}
function f1() {
    var x = BigObject() ;
    closure =  (function(y) { return function(){return y++;} ; })(0) ;
}
function f2() {
    var x = BigObject() ;
    var y = 0 ;
    closure = function(){ return y++ ; } ;
}

In every case, after the function has been executed, there is (I think) no way to reach x and so the BigObject can be garbage collected, as long as x is the last reference to it. A simple minded interpreter would capture the whole scope chain whenever a function expression is evaluated. (For one thing, you need to do this to make calls to eval work -- example below). A smarter implementation might avoid this in f0 and f1. An even smarter implementation would allow y to be retained, but not x, as is needed for f2 to be efficient.

My question is how do the modern JavaScript engines (JaegerMonkey, V8, etc.) deal with these situations?

Finally, here is an example that shows that variables may need to be retained even if they are never mentioned in the nested function.

var f = (function(x, y){ return function(str) { return eval(str) ; } } )(4, 5) ;
f("1+2") ; // 3
f("x+y") ; // 9
f("x=6") ;
f("x+y") ; // 11

However, there are restrictions that prevent one from sneaking in a call to eval in ways that might be missed by the compiler.

回答1:

It's not true that there are restrictions that prevent you from calling eval that would be missed by static-analysis: it's just that such references to to eval run in the global scope. Note that this is a change in ES5 from ES3 where indirect and direct references to eval both ran in the local scope, and as such, I'm unsure whether anything actually does any optimizations based upon this fact.

An obvious way to test this is to make BigObject be a really big object, and force a gc after running f0–f2. (Because, hey, as much as I think I know the answer, testing is always better!)

So…

The test

var closure;
function BigObject() {
  var a = '';
  for (var i = 0; i <= 0xFFFF; i++) a += String.fromCharCode(i);
  return new String(a); // Turn this into an actual object
}
function f0() {
  var x = new BigObject();
  var y = 0;
  closure = function(){ return 7; };
}
function f1() {
  var x = new BigObject();
  closure =  (function(y) { return function(){return y++;}; })(0);
}
function f2() {
  var x = new BigObject();
  var y = 0;
  closure = function(){ return y++; };
}
function f3() {
  var x = new BigObject();
  var y = 0;
  closure = eval("(function(){ return 7; })"); // direct eval
}
function f4() {
  var x = new BigObject();
  var y = 0;
  closure = (1,eval)("(function(){ return 7; })"); // indirect eval (evaluates in global scope)
}
function f5() {
  var x = new BigObject();
  var y = 0;
  closure = (function(){ return eval("(function(){ return 7; })"); })();
}
function f6() {
  var x = new BigObject();
  var y = 0;
  closure = function(){ return eval("(function(){ return 7; })"); };
}
function f7() {
  var x = new BigObject();
  var y = 0;
  closure = (function(){ return (1,eval)("(function(){ return 7; })"); })();
}
function f8() {
  var x = new BigObject();
  var y = 0;
  closure = function(){ return (1,eval)("(function(){ return 7; })"); };
}
function f9() {
  var x = new BigObject();
  var y = 0;
  closure = new Function("return 7;"); // creates function in global scope
}

I've added tests for eval/Function, seeming these are also interesting cases. The different between f5/f6 is interesting, because f5 is really just identical to f3, given what is really an identical function for closure; f6 merely returns something that once evaluated gives that, and as the eval hasn't yet been evaluated, the compiler can't know that there is no reference to x within it.

SpiderMonkey

js> gc();
"before 73728, after 69632, break 01d91000\n"
js> f0();
js> gc(); 
"before 6455296, after 73728, break 01d91000\n"
js> f1(); 
js> gc(); 
"before 6455296, after 77824, break 01d91000\n"
js> f2(); 
js> gc(); 
"before 6455296, after 77824, break 01d91000\n"
js> f3(); 
js> gc(); 
"before 6455296, after 6455296, break 01db1000\n"
js> f4(); 
js> gc(); 
"before 12828672, after 73728, break 01da2000\n"
js> f5(); 
js> gc(); 
"before 6455296, after 6455296, break 01da2000\n"
js> f6(); 
js> gc(); 
"before 12828672, after 6467584, break 01da2000\n"
js> f7(); 
js> gc(); 
"before 12828672, after 73728, break 01da2000\n"
js> f8(); 
js> gc(); 
"before 6455296, after 73728, break 01da2000\n"
js> f9(); 
js> gc(); 
"before 6455296, after 73728, break 01da2000\n"

SpiderMonkey appears to GC "x" on everything except f3, f5, and f6.

It appears to as much as possible (i.e., when possible, y, as well as x) unless there is direct eval call within the scope-chain of any function that still exists. (Even if that function object itself has been GC'd and no longer exists, as is the case in f5, which theoretically means that it could GC x/y.)

V8

gsnedders@dolores:~$ v8 --expose-gc --trace_gc --shell foo.js
V8 version 3.0.7
> gc();
Mark-sweep 0.8 -> 0.7 MB, 1 ms.
> f0();
Scavenge 1.7 -> 1.7 MB, 2 ms.
Scavenge 2.4 -> 2.4 MB, 2 ms.
Scavenge 3.9 -> 3.9 MB, 4 ms.
> gc();   
Mark-sweep 5.2 -> 0.7 MB, 3 ms.
> f1();
Scavenge 4.7 -> 4.7 MB, 9 ms.
> gc();
Mark-sweep 5.2 -> 0.7 MB, 3 ms.
> f2();
Scavenge 4.8 -> 4.8 MB, 6 ms.
> gc();
Mark-sweep 5.3 -> 0.8 MB, 3 ms.
> f3();
> gc();
Mark-sweep 5.3 -> 5.2 MB, 17 ms.
> f4();
> gc();
Mark-sweep 9.7 -> 0.7 MB, 5 ms.
> f5();
> gc();
Mark-sweep 5.3 -> 5.2 MB, 12 ms.
> f6();
> gc();
Mark-sweep 9.7 -> 5.2 MB, 14 ms.
> f7();
> gc();
Mark-sweep 9.7 -> 0.7 MB, 5 ms.
> f8();
> gc();
Mark-sweep 5.2 -> 0.7 MB, 2 ms.
> f9();
> gc();
Mark-sweep 5.2 -> 0.7 MB, 2 ms.

V8 appears to GC x on everything apart from f3, f5, and f6. This is identical to SpiderMonkey, see analysis above. (Note however that the numbers aren't detailed enough to tell whether y is being GC'd when x is not, I've not bothered to investigate this.)

Carakan

I'm not going to bother running this again, but needless to say behaviour is identical to SpiderMonkey and V8. Harder to test without a JS shell, but doable with time.

JSC (Nitro) and Chakra

Building JSC is a pain on Linux, and Chakra doesn't run on Linux. I believe JSC has the same behaviour to the above engines, and I'd be surprised if Chakra didn't have too. (Doing anything better quickly becomes very complex, doing anything worse, well, you'd almost never be doing GC and have serious memory issues…)



回答2:

In normal situations, local variables in a function are allocated on the stack -- and they "automatically" go away when the function returns. I believe many popular JavaScript engines run the interpreter (or JIT compiler) on a stack machine architecture so this obversation should be reasonably valid.

Now if a variable is referred to in a closure (i.e. by a function defined locally that may be called later on), the "inside" function is assigned a "scope chain" that starts with the inner-most scope which is the function itself. The next scope is then the outer function (which contains the local variable accessed). The interpreter (or compiler) will create a "closure", essentially a piece of memory allocated on the heap (not the stack) that contains those variables in the scope.

Therefore, if local variables are referred to in a closure, they are no longer allocated on the stack (which will make them go away when the function returns). They are allocated just like normal, long-lived variables, and the "scope" contains a pointer to each of them. The "scope-chain" of the inner function contains pointers to all these "scopes".

Some engines optimize the scope chain by omitting variables that are shadowed (i.e. covered up by a local variable in an inner scope), so in your case only one BigObject remains, as long as the variable "x" is only accessed in the inner scope, and there are no "eval" calls in the outer scopes. Some engines "flatten" scope chains (I think V8 does that) for fast variable resolution -- something that can be done only if there are no "eval" calls in between (or no calls to functions that may do an implicit eval, e.g. setTimeout).

I'd invite some JavaScript engine guru's to provide more juicy details than I can.