-->

What does lexical scope look like in memory model?

2019-08-05 08:09发布

问题:

Say we have a function:

function foo() {
    var x = 10;         
    function bar() {
        var y = 20; 
        return x + y;   
    }
    return bar();
}

console.log(foo());

What would this look like in a memory model. So far this is what I imagine it looks like on the stack?

TOP OF THE STACK 
-------------- 
bar()
y = 20
return x + 20
-------------- 
foo()
x= 10
bar()
--------------
BOTTOM OF THE STACK 

What does lexical scope look like how does bar know what x is? Is foo() on the heap? or does bar() have a pointer to foo()?

回答1:

Well, after the call to foo completes, everything created during the call to it is eligible for garbage collection (GC), because nothing in that code is holding onto anything created during the call. The more interesting question would be what happens if foo returns bar (the function, not bar() the number resulting from calling bar).

But with the code you have, here's the theory of what happens when you call foo (defined in §10.4.3 of the spec):

  1. The engine creates a new declarative environment which is initially the lexical environment and variable environment for that specific call to foo (and normally those don't separate; the with keyword can separate them, but most people don't use it). That declarative environment has a binding object associated with it.

  2. Any declared arguments to foo, the name foo, any variables within foo declared with var, the names of any functions declared via function declarations, and a couple of other things are (in a defined order) created as properties on that binding object (details in §10.5).

  3. The process of creating the bar function (described in §13.2) attaches the lexical environment of the call to foo to the bar function as its [[Scope]] property (not a literal name you can use in code, but the name used in the spec).

  4. The x property of the binding object (e.g., the x variable) gets the value 10.

  5. The call to bar creates a whole new declarative environment, etc., with the y variable. The new environment's binding object has a link back to the binding object for the environment in which it was created. That environment gets bar's [[Scope]] property as its outer lexical environment reference.

  6. The y property on the innermost binding object gets the value 20.

  7. The expression x + y is evaluated:

    1. The engine tries to resolve x to get its value. First it looks at the innermost binding object to see if it has a property called x, but it doesn't.

    2. The engine goes to the outer lexical environment of the current one to see if it has an x property on its binding object. Since it does, the engine reads the value of the property and uses that in the expression.

    3. The engine tries to resolve y to get its value. First it looks at the innermost binding object to see if it has a property called y; it does, and so the engine uses that value for the expression.

  8. The engine completes the expression by adding 20 to 10, pushes the result on the stack, and returns out of bar.

  9. At this point, the environment and binding object for the call to bar can be reclaimed via GC.

  10. The engine takes the return value from bar, pushes it on the stack, and returns from foo.

  11. At this point, the environment and binding object for the call to foo can be reclaimed via GC.

  12. The code calls console.log with the result. (Details omitted.)

So in theory, no enduring memory impact. The environments and their binding objects can be tossed.

Now, in fact, modern JavaScript engines are really smart, and use the stack for certain object allocations so that they don't have to invoke GC to reclaim these environments and binding objects. (But keep reading.)

Now, suppose foo looked like this:

function foo() {
    var x = 10;         
    function bar() {
        var y = 20; 
        return x + y;   
    }
    return bar;
}

And we did this:

var b = foo();

Now, foo returns a reference to bar (without calling it).

Steps 1-4 above are unchanged, but then instead of calling bar, foo returns a reference to it. That means that the environment and binding object created by calling foo are not eligible for GC, because the bar function created during that call has a reference to them, and we have a reference to that function (via the b variable). So in theory at that point, something like this exists on the heap:

+-----+     +-------------+
|  b  |---->|   Function  |
+-----+     +-------------+
            | name: "bar" |     +----------------+
            | [[Scope]]   |---->|   environment  |
            +-------------+     +----------------+     +-------+
                                | Binding Object |---->| x: 10 |
                                +----------------+     +-------+

So if modern engines are smart about allocating these objects on stack (sometimes), how can they still exist after foo returns? You'd have to dig into the internals of individual engines to be sure. Some probably perform static analysis to see whether the situation is possible and use heap allocation from the start if the binding object can survive. Some may just determine when foo is returning what should survive and copy those things from the stack to the heap. Or [insert really smart compiler writer stuff here]. Some engines may be smart enough to only retain the things that can possibly be referenced (so if you had variables in foo that were never referenced in any way by bar, they might be pruned from the binding object). High-level, the spec requires that it seem like the structure above is retained in memory, that nothing we can do in our code can prove that that isn't what happened.

If we then call b, we pick up with the steps above, executing Steps 5 through 10, but when b returns, the structure above continues to exist.

This is how JavaScript closures work.