Letrec and reentrant continuations

2019-05-07 07:49发布

问题:

I have been told that the following expression is intended to evaluate to 0, but that many implementations of Scheme evaluate it as 1:

(let ((cont #f))
  (letrec ((x (call-with-current-continuation (lambda (c) (set! cont c) 0)))
           (y (call-with-current-continuation (lambda (c) (set! cont c) 0))))
    (if cont
        (let ((c cont))
          (set! cont #f)
          (set! x 1)
          (set! y 1)
          (c 0))
        (+ x y))))

I must admit that I cannot tell where even to start with this. I understand the basics of continuations and call/cc, but can I get a detailed explanation of this expression?

回答1:

This is an interesting fragment. I came across this question because I was searching for discussions of the exact differences between letrec and letrec*, and how these varied between different versions of the Scheme reports, and different Scheme implementations. While experimenting with this fragment, I did some research and will report the results here.

If you mentally walk through the execution of this fragment, two questions should be salient to you:

Q1. In what order are the initialization clauses for x and y evaluated?

Q2. Are all the initialization clauses evaluated first, and their results cached, and then all the assignments to x and y performed afterwards? Or are some of the assignments made before some of the initialization clauses have been evaluated?

For letrec, the Scheme reports say that the answer to Q1 is "unspecified." Most implementations will in fact evaluate the clauses in left-to-right order; but you shouldn't rely on that behavior.

Scheme R6RS and R7RS introduce a new binding construction letrec* that does specify left-to-right evaluation order. It also differs in some other ways from letrec, as we'll see below.

Returning to letrec, the Scheme reports going back at least as far as R5RS seem to specify that the answer to Q2 is "evaluate all the initialization clauses before making any of the assignments." I say "seem to specify" because the language isn't as explicit about this being required as it might be. As a matter of fact, many Scheme implementations don't conform to this requirement. And this is what's responsible for the difference between the "intended" and "observed" behavior wrt your fragment.

Let's walk through your fragment, with Q2 in mind. First we set aside two "locations" (reference cells) for x and y to be bound to. Then we evaluate one of the initialization clauses. Let's say it's the clause for x, though as I said, with letrec it could be either one. We save the continuation of this evaluation into cont. The result of this evaluation is 0. Now, depending on the answer to Q2, we either assign that result immediately to x or we cache it to make the assignment later. Next we evaluate the other initialization clause. We save its continuation into cont, overwriting the previous one. The result of this evaluation is 0. Now all of the initialization clauses have been evaluated. Depending on the answer to Q2, we might at this point assign the cached result 0 to x; or the assignment to x may have already occurred. In either case, the assignment to y takes place now.

Then we begin evaluating the main body of the (letrec (...) ...) expression (for the first time). There is a continuation stored in cont, so we retrieve it into c, then clear cont and set! each of x and y to 1. Then we invoke the retrieved continuation with the value 0. This goes back to the last-evaluated initialization clause---which we've assumed to be y's. The argument we supply to the continuation is then used in place of the (call-with-current-continuation (lambda (c) (set! cont c) 0)), and will be assigned to y. Depending on the answer to Q2, the assignment of 0 to x may or may not take place (again) at this point.

Then we begin evaluating the main body of the (letrec (...) ...) expression (for the second time). Now cont is #f, so we get (+ x y). Which will be either (+ 1 0) or (+ 0 0), depending on whether 0 was re-assigned to x when we invoked the saved continuation.

You can trace this behavior by decorating your fragment with some display calls, for example like this:

(let ((cont #f))
 (letrec ((x (begin (display (list 'xinit x y cont)) (call-with-current-continuation (lambda (c) (set! cont c) 0))))
          (y (begin (display (list 'yinit x y cont)) (call-with-current-continuation (lambda (c) (set! cont c) 0)))))
  (display (list 'body x y cont))
  (if cont
   (let ((c cont))
    (set! cont #f)
    (set! x 1)
    (set! y 1)
    (c 'new))
   (cons x y))))

I also replaced (+ x y) with (cons x y), and invoked the continuation with the argument 'new instead of 0.

I ran that fragment in Racket 5.2 using a couple of different "language modes", and also in Chicken 4.7. Here are the results. Both implementations evaluated the x init clause first and the y clause second, though as I said this behavior is unspecified.

Racket with #lang r5rs and #lang r6rs conforms to the spec for Q2, and so we get the "intended" result of re-assigning 0 to the other variable when the continuation is invoked. (When experimenting with r6rs, I needed to wrap the final result in a display to see what it would be.)

Here is the trace output:

(xinit #<undefined> #<undefined> #f)
(yinit #<undefined> #<undefined> #<continuation>)
(body 0 0 #<continuation>)
(body 0 new #f)
(0 . new)

Racket with #lang racket and Chicken don't conform to that. Instead, after each initialization clause is evaluated, it gets assigned to the corresponding variable. So when the continuation is invoked, it only ends up re-assigning a value to the final value.

Here is the trace output, with some added comments:

(xinit #<undefined> #<undefined> #f)
(yinit 0 #<undefined> #<continuation>) ; note that x has already been assigned
(body 0 0 #<continuation>)
(body 1 new #f) ; so now x is not re-assigned
(1 . new)

Now, as to what the Scheme reports really do require. Here is the relevant section from R5RS:

library syntax: (letrec <bindings> <body>)

Syntax: <Bindings> should have the form ((<variable1> <init1>) ...), and <body> should be a sequence of one or more expressions. It is an error for a <variable> to appear more than once in the list of variables being bound.

Semantics: The <variable>s are bound to fresh locations holding undefined values, the <init>s are evaluated in the resulting environment (in some unspecified order), each <variable> is assigned to the result of the corresponding <init>, the <body> is evaluated in the resulting environment, and the value(s) of the last expression in <body> is(are) returned. Each binding of a <variable> has the entire letrec expression as its region, making it possible to define mutually recursive procedures.

(letrec ((even?
          (lambda (n)
            (if (zero? n)
                #t
                (odd? (- n 1)))))
         (odd?
          (lambda (n)
            (if (zero? n)
                #f
                (even? (- n 1))))))
  (even? 88))   
===>  #t

One restriction on letrec is very important: it must be possible to evaluate each <init> without assigning or referring to the value of any <variable>. If this restriction is violated, then it is an error. The restriction is necessary because Scheme passes arguments by value rather than by name. In the most common uses of letrec, all the <init>s are lambda expressions and the restriction is satisfied automatically.

The first sentence of the "Semantics" sections sounds like it requires all the assignments to happen after all the initialization clauses have been evaluated; though, as I said earlier, this isn't as explicit as it might be.

In R6RS and R7RS, the only substantial changes to this part of the specification is the addition of a requirement that:

the continuation of each <init> should not be invoked more than once.

R6RS and R7RS also add another binding construction, though: letrec*. This differs from letrec in two ways. First, it does specify a left-to-right evaluation order. Correlatively, the "restriction" noted above can be relaxed somewhat. It's now okay to reference the values of variables that have already been assigned their initial values:

It must be possible to evaluate each <init> without assigning or referring to the value of the corresponding <variable> or the <variable> of any of the bindings that follow it in <bindings>.

The second difference is with respect to our Q2. With letrec*, the specification now requires that the assignments take place after each initialization clause is evaluated. Here is the first paragraph of the "Semantics" from R7RS (draft 6):

Semantics: The <variable>s are bound to fresh locations, each <variable> is assigned in left-to-right order to the result of evaluating the corresponding <init>, the <body> is evaluated in the resulting environment, and the values of the last expression in <body> are returned. Despite the left-to-right evaluation and assignment order, each binding of a <variable> has the entire letrec* expression as its region, making it possible to define mutually recursive procedures.

So Chicken, and Racket using #lang racket---and many other implementations---seem in fact to implement their letrecs as letrec*s.



回答2:

The reason for this being evaluated to 1 is because of (set! x 1). If instead of 1 you set x to 0 then it will result in zero. This is because the continuation variable cont which is storing the continuation is actually storing the continuation for y and not for x as it is being set to y's continuation after x's.