What is this madness?

2020-07-20 04:00发布

问题:

I've never seen anything like this; I can't seem to wrap my head around it. What does this code even do? It looks super fancy, and I'm pretty sure this stuff is not described anywhere in my C book. :(

union u;
typedef union u (*funcptr)();

union u {
  funcptr f;
  int i;
};

typedef union u $;

int main() {
  int printf(const char *, ...);

  $ fact =
      ($){.f = ({
            $ lambda($ n) {
              return ($){.i = n.i == 0 ? 1 : n.i * fact.f(($){.i = n.i - 1}).i};
            }
            lambda;
          })};

  $ make_adder = ($){.f = ({
                       $ lambda($ n) {
                         return ($){.f = ({
                                      $ lambda($ x) {
                                        return ($){.i = n.i + x.i};
                                      }
                                      lambda;
                                    })};
                       }
                       lambda;
                     })};

  $ add1 = make_adder.f(($){.i = 1});

  $ mul3 = ($){.f = ({
                 $ lambda($ n) { return ($){.i = n.i * 3}; }
                 lambda;
               })};

  $ compose = ($){
      .f = ({
        $ lambda($ f, $ g) {
          return ($){.f = ({
                       $ lambda($ n) {
                         return ($){.i = f.f(($){.i = g.f(($){.i = n.i}).i}).i};
                       }
                       lambda;
                     })};
        }
        lambda;
      })};

  $ mul3add1 = compose.f(mul3, add1);

  printf("%d\n", fact.f(($){.i = 5}).i);
  printf("%d\n", mul3.f(($){.i = add1.f(($){.i = 10}).i}).i);
  printf("%d\n", mul3add1.f(($){.i = 10}).i);
  return 0;
}

回答1:

This example primarily builds on two GCC extensions: nested functions, and statement expressions.

The nested function extension allows you to define a function within the body of another function. Regular block scoping rules apply, so the nested function has access to the local variables of the outer function when it is called:

void outer(int x) {
    int inner(int y) {
        return x + y;
    }
    return inner(6);
}

...
int z = outer(4)' // z == 10

The statement expression extension allows you to wrap up a C block statement (any code you would normally be able to place within braces: variable declarations, for loops, etc.) for use in a value-producing context. It looks like a block statement in parentheses:

int foo(x) {
    return 5 + ({
        int y = 0;
        while (y < 10) ++y;
        x + y;
    });
}

...
int z = foo(6); // z == 20

The last statement in the wrapped block provides the value. So it works pretty much like you might imagine an inlined function body.

These two extensions used in combination let you define a function body with access to the variables of the surrounding scope, and use it immediately in an expression, creating a kind of basic lambda expression. Since a statement expression can contain any statement, and a nested function definition is a statement, and a function's name is a value, a statement expression can define a function and immediately return a pointer to that function to the surrounding expression:

int foo(int x) {
    int (*f)(int) = ({      // statement expression
        int nested(int y) { // statement 1: function definition
            return x + y;
        }
        nested;             // statement 2 (value-producing): function name
    });                     // f == nested

    return f(6); // return nested(6) == return x + 6
}

The code in the example is dressing this up further by using the dollar sign as a shortened identifier for a return type (another GCC extension, much less important to the functionality of the example). lambda in the example isn't a keyword or macro (but the dollar is supposed to make it look like one), it's just the name of the function (reused several times) being defined within the statement expression's scope. C's rules of scope nesting mean it's perfectly OK to reuse the same name within a deeper scope (nested "lambdas"), especially when there's no expectation of the body code using the name for any other purpose (lambdas are normally anonymous, so the functions aren't expected to "know" that they're actually called lambda).

If you read the GCC documentation for nested functions, you'll see that this technique is quite limited, though. Nested functions expire when the lifetime of their containing frame ends. That means they can't be returned, and they can't really be stored usefully. They can be passed up by pointer into other functions called from the containing frame that expect a normal function pointer, so they are fairly useful still. But they don't have anywhere near the flexibility of true lambdas, which take ownership (shared or total depends on the language) of the variables they close over, and can be passed in all directions as true values or stored for later use by a completely unrelated part of the program. The syntax is also fairly ungainly, even if you wrap it up in a lot of helper macros.

C will most likely be getting true lambdas in the next version of the language, currently called C2x. You can read more about the proposed form here - it doesn't really look much like this (it copies the anonymous function syntax and semantics found in Objective-C). The functions created this way have lifetimes that can exceed their creating scope; the function bodies are true expressions, without the need for a statement-containing hack; and the functions themselves are truly anonymous, no intermediate names like lambda required.


A C2x version of the above example will most likely look something like this:

#include <stdio.h>

int main(void) {
  typedef int (^ F)(int);

  __block F fact;  // needs to be mutable - block can't copy-capture
                   // its own variable before initializing it
  fact = ^(int n) {
    return n == 0 ? 1 : n * fact(n - 1);
  };

  F (^ make_adder)(int) = ^(int n) {
    return _Closure_copy(^(int x) { return n + x; });
  };

  F add1 = make_adder(1);

  F mul3 = ^(int n) { return n * 3; };

  F (^ compose)(F, F) = ^(F f, F g) {
    return _Closure_copy(^(int n) { return f(g(n)); });
  };

  F mul3add1 = compose(mul3, add1);

  printf("%d\n", fact(5));
  printf("%d\n", mul3(add1(10)));
  printf("%d\n", mul3add1(10));

  _Closure_free(add1);
  _Closure_free(mul3add1);

  return 0;
}

Much simpler without all that union stuff.

(You can compile and run this modified example in Clang right now - use the -fblocks flag to enable the lambda extension, add #include <Block.h> to the top of the file, and replace _Closure_copy and _Closure_free with Block_copy and Block_release respectively.)



标签: c gcc