A couple of years ago I started writing an interpreter for a little Domain Specific Language which included programmer-defined functions.
At first I implemented variable scope using a simple stack of symbol-tables. But now I want to move to proper lexical scoping (with the option of closures). Can anyone explain the data-structure and algorithm behind lexical scope?
To get correct lexical scoping and closures in an interpreter, all you need to do is follow these rules:
eval(expression, env) => value
.apply(function, arguments) => value
.{function definition, env-at-definition-time}
.To expand on that last bit in Python-ish syntax:
should be executed as if it were
where the second dict argument may be just the current-env rather than a data structure constructed at that time. (On the other hand, retaining the entire env rather than just the closed-over variables can mean programs have surprising memory leaks because closures are holding onto things the don't need. This is worth fixing in any 'practical' language implementation but not when you are just experimenting with language semantics.)
Stroustrup implemented this in the first C++ compiler simply with one symbol table per scope, and a chaining rule that followed scopes outwards until a definition is found. How this works exactly depends on your precise semantics. Make sure you nail those down first.
Knuth in The Art of Computer Programming, Vol 1, gives an algorithm for a Cobol symbol table whereby scoping is done via links.
There are many different ways to implement lexical scoping. Here are some of my favorites:
If you don't need super-fast performance, use a purely functional data structure to implement your symbol tables, and represent a nested function by a pair containing a pointer to the code and a pointer to the symbol table.
If you need native-code speeds, my favorite technique is described in Making a Fast Curry by Simon Marlow and Simon Peyton Jones.
If you need native-code speeds, but curried functions are not that important, consider closure-passing style.
Read The implementation of Lua 5.0 for instance.
There is no single right way to do this. The important thing is to clearly state the semantics that you are looking to provide, and then the data structures and algorithms will follow.