A couple of years ago I started writing an interpreter for a little Domain Specific Language which included programmer-defined functions.
At first I implemented variable scope using a simple stack of symbol-tables. But now I want to move to proper lexical scoping (with the option of closures). Can anyone explain the data-structure and algorithm behind lexical scope?
To get correct lexical scoping and closures in an interpreter, all you need to do is follow these rules:
- In your interpreter, variables are always looked up in an environment table passed in by the caller or kept as a variable, not some global env-stack. The signature of your eval operation is like
eval(expression, env) => value
.
- When interpreted code calls a function, the environment is NOT passed to that function. The signature of your function application operation is like
apply(function, arguments) => value
.
- When an interpreted function is called, the environment its body is evaluated in is the environment in which the function definition was made, and has nothing whatsoever to do with the caller. So if you have a local function, then it is a closure, that is, a data structure containing fields
{function definition, env-at-definition-time}
.
To expand on that last bit in Python-ish syntax:
x = 1
return lambda y: x + y
should be executed as if it were
x = 1
return make_closure(<AST for "lambda y: x + y">, {"x": x})
where the second dict argument may be just the current-env rather than a data structure constructed at that time. (On the other hand, retaining the entire env rather than just the closed-over variables can mean programs have surprising memory leaks because closures are holding onto things the don't need. This is worth fixing in any 'practical' language implementation but not when you are just experimenting with language semantics.)
There are many different ways to implement lexical scoping. Here are some of my favorites:
If you don't need super-fast performance, use a purely functional data structure to implement your symbol tables, and represent a nested function by a pair containing a pointer to the code and a pointer to the symbol table.
If you need native-code speeds, my favorite technique is described in Making a Fast Curry by Simon Marlow and Simon Peyton Jones.
If you need native-code speeds, but curried functions are not that important, consider closure-passing style.
There is no single right way to do this. The important thing is to clearly state the semantics that you are looking to provide, and then the data structures and algorithms will follow.
Read The implementation of Lua 5.0 for instance.
Stroustrup implemented this in the first C++ compiler simply with one symbol table per scope, and a chaining rule that followed scopes outwards until a definition is found. How this works exactly depends on your precise semantics. Make sure you nail those down first.
Knuth in The Art of Computer Programming, Vol 1, gives an algorithm for a Cobol symbol table whereby scoping is done via links.