How do you compile macros in a Lisp compiler?

2019-03-08 15:38发布

问题:

In a Lisp interpreter, there can easily be a branch in eval that can expand a macro, and in the process of expanding it, call functions to build up the expanded expression. I've done this before using low-level macros, it's easily concieved.

But, in a compiler there aren't any functions to call to build up the expanded code: The issue can be seen quite simply in the following example:

(defmacro cube (n)
    (let ((x (gensym)))
      `(let ((,x ,n))
          (* ,x ,x ,x))))

When the macro is expanded by an interpreter, it calls gensym and does what you expect. When expanded by a compiler, you'd generate the code for a let which binds x to (gensym) but the gensymmed symbol is only necessary for the compiler to do the right thing. And since gensym isn't actually called before the macro is compiled, it's not very useful.

This gets even more strange to me when a macro builds up a list to be used as the expansion using map or filter.

So how does this work? Surely the compiled code isn't compiled to (eval *macro-code*) because that'd be horribly inefficient. Is there a well written Lisp compiler where this is clear?

回答1:

How this works is very different in various Lisp dialects. For Common Lisp it is standardized in the ANSI Common Lisp standard and the various Common Lisp implementations differ mostly whether they use a compiler, an interpreter or both.

The following assumes Common Lisp.

EVAL is not the interpreter. EVAL can be implemented with a compiler. Some Common Lisp implementations even don't have an interpreter. Then EVAL is a call to the compiler to compile the code and then calls the compiled code. These implementations use an incremental compiler, which can compile also simple expressions like 2, (+ 2 3), (gensym), and so on.

Macroexpansion is done with the functions MACROEXPANDand MACROEXPAND-1.

A macro in Common Lisp is a function that expects some forms and returns another form. DEFMACRO registers this function as a macro.

Your macro

(defmacro cube (n)
  (let ((x (gensym)))
    `(let ((,x ,n))
        (* ,x ,x ,x))))

is nothing but a Lisp function, which is registered as a macro.

The effect is similar to this:

(defun cube-internal (form environment)
  (destructuring-bind (name n) form   ; the name would be CUBE
    (let ((x (gensym)))
      `(let ((,x ,n))
         (* ,x ,x ,x)))))

(setf (macro-function 'my-cube) #'cube-internal)

In a real CL implementation DEFMACRO expands differently and does not use a name like CUBE-INTERNAL. But conceptually it is defining a macro function and registering it.

When the Lisp compiler sees a macro definition, it usually compiles the macro function and stores it in the current so-called environment. If the environment is the runtime environment, it is remembered at runtime. If the environment is the compiler environment while compiling a file, the macro is forgotten after the file is compiled. The compiled file needs to be loaded so that Lisp then knows the macro.

So, there is a side effect in defining a macro and compiling it. The compiler remembers the compiled macro and stores its code.

When the compiler now sees some code which uses the macro (cube 10), then the compiler just calls the macro function which is stored in the current environment under the name CUBE, calls this macro function which 10 as an argument, and then compiles the generated form. As mentioned above, it is not done directly, but via the MACROEXPAND functions.

Here is the Macro definition:

CL-USER 5 > (defmacro cube (n)
              (let ((x (gensym)))
                `(let ((,x ,n))
                   (* ,x ,x ,x))))
CUBE

We compile the macro:

CL-USER 6 > (compile 'cube)
CUBE
NIL
NIL

MACRO-FUNCTION returns the function for a macro. We can call it like any other function with FUNCALL. It expects two arguments: a whole form like (cube 10) and an environment (here NIL).

CL-USER 7 > (funcall (macro-function 'cube) '(cube 10) nil)
(LET ((#:G2251 10)) (* #:G2251 #:G2251 #:G2251))

It is also possible to take a function (which accepts two arguments: a form and an environment) and store it using SETF as a macro function.

Summary

When the Common Lisp compiler runs, it simply knows the macro functions and calls them when necessary to expand code via the built-in macro expander. The macro functions are simply Lisp code themselves. When the Lisp compiler sees a macro definition, it compiles the macro function, stores it in the current environment and uses it to expand subsequent uses of the macro.

Note: This makes it necessary in Common Lisp that a macro is defined before it can be used by the compiler.



回答2:

There are many approaches to this. One extreme is something called "FEXPER", which are macro-like things that essentially get re-expanded on every evaluation. They caused a lot of noise at some point in the past but have almost completely disappeared. (There are a few people who still do similar things though, newlisp is probably the most popular example.)

So FEXPERs were dumped in favor of macros, which are more "well behaved" in a way. You basically do macro expansion once, and compile the resulting code. As usual, there are a few strategies here, which can lead to different results. For example, "expand once" doesn't specify when it gets expanded. This can happen as soon as the code is read, or (usually) when it is compiled, or even just on the first time it runs.

Another question here -- and that's essentially where you stand -- is in what environment you evaluate the macro code. In most Lisps, everything happens in the same happy global environment. A macro can access functions freely, which can lead to some subtle problems. One outcome of this is that many commercial Common Lisp implementations give you a development environment where you do most of your work and compile things -- this makes the same environment available on both levels. (Actually, since macros can use macros, there are an arbitrary number of levels here.) To deploy an application you get a restricted environment that doesn't have, for example, the compiler (ie, the compile function), since if you deploy code that uses that, your code is essentially a CL compiler. So the idea is that you compile the code on your full implementation, and that expands all macros, which means that the compiled code has no additional uses of macros.

But of course that can lead to those subtle problems that I talked about. For example, some side-effects can lead to a loading-order mess, where you need to load code in a specific order. Worse, you could fall into a trap where code runs one way for you, and another way when it's compiled -- since compiled code already had all macros (and the calls they made) expanded beforehand. There are some hackish solutions to these, like eval-when that specifies certain conditions for evaluating some code. There are also a few package systems for CL where you specify things like loading order (like asdf). Still, there is no real robust solution there, and you can still fall into these traps (see for example this extended rant).

There are alternatives, of course. Most notably, Racket uses its module system. A module can be "instantiated" multiple times, and state is unique to each instance. Now, when some module is used in both macros and in runtime, the two instances of this modules are distinct, which means that compilation is always reliable, and there are none of the above headaches. In the Scheme world, this is known as "separate phases", where each phase (runtime, compile-time, and higher levels with macros-using-macros) has separate module instances. For a good introduction to this and a thorough explanation, read Matthew Flatt's Composable and Compilable Macros. You could also just look at the Racket docs, for example, the Compile and Run-Time Phases section.



回答3:

There's nothing particularly magical about macros.

At a high level, they're simply functions. Functions that return S-Exprs for Lisp forms. The "runtime" for the macro is available in the macroexpand function, which as you may already know, expands macros.

So, you can look at it in terms that the compiler detects that a form is a macro, evaluates it, then compiles the subsequent form that is returned as a result of that macro.

Normally, there is a lot of quoting and splicing and other list surgery within macros to make them easier to write, much like a templating system. But those constructs are not required. You can return an S-Expr built however you want. So, looked at that way, you can see they, at their core, they're just simple functions evaluated at compile time.



回答4:

You found one of the main differences between Lisp and other languages.

In Lisp the execution of dynamically created code is essential, and for example necessary for macro expansion.

While writing a lisp to C compiler I discovered this now obvious thing myself and come to the conclusion that If you want to write a Lisp compiler there are only two solutions:

  1. You write BOTH a compiler and an interpreter so that you can call the interpreter for macro expansion during compilation.

  2. You must be able to dynamically compile code and call it (or using worse "tricks" like compiling a dynamically loadable module and then loading it).

If you are working on a compiler for C one possibility is to use Fabrice Bellard's TCC library that allows direct compilation of C code to a memory buffer.

I'm writing a Lisp to Javascript compiler and in this case there is of course no problem because "the hardware" can handle that nicely and you can ask Javascript to evaluate for example a string "function(...){...}" and then call the resulting object. Using Javascript also solves what is IMO one of the most difficult issues for a Lisp kernel that is proper implementation of lexical closures.

Indeed in my javascript compiler eval is just more or less

(defun eval (x)
    (funcall (js-eval (js-compile x))))

where js-compile is the main compiler interface and given a lisp form will give back a string containing javascript code that when evaluated (with the eval of javascript that I exported to the lisp level as js-eval) executes the code. Interestingly enough also eval is never used (with the only non essential exception of a convenience macro in which I've to execute user defined code during macro expansion).

One important thing to consider is that while Common Lisp has sort of separation between "read time", "compile time" and "run time" still this separation is more logical than physical as the running code is always Lisp. Compiling in Lisp is just calling a function. Even the "parsing" phase is just a regular lisp function executing... it's Lisp all the way down :-)

Links to my Lisp → Js toy compiler

  • Source code
  • Video demo 1
  • Video demo 2
  • Website