Why are nested functions not supported by the C st

2019-01-14 17:18发布

站内文章 / C

33 0

啃猪蹄的小仙女

女 | 书童

私信

可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

It doesn't seem like it would be too hard to implement in assembly.

gcc also has a flag (-fnested-functions) to enable their use.

回答1:

It turns out they're not actually all that easy to implement properly.

Should an internal function have access to the containing scope's variables? If not, there's no point in nesting it; just make it static (to limit visibility to the translation unit it's in) and add a comment saying "This is a helper function used only by myfunc()".

If you want access to the containing scope's variables, though, you're basically forcing it to generate closures (the alternative is restricting what you can do with nested functions enough to make them useless). I think GCC actually handles this by generating (at runtime) a unique thunk for every invocation of the containing function, that sets up a context pointer and then calls the nested function. This ends up being a rather Icky hack, and something that some perfectly reasonable implementations can't do (for example, on a system that forbids execution of writable memory - which a lot of modern OSs do for security reasons). The only reasonable way to make it work in general is to force all function pointers to carry around a hidden context argument, and all functions to accept it (because in the general case you don't know when you call it whether it's a closure or an unclosed function). This is inappropriate to require in C for both technical and cultural reasons, so we're stuck with the option of either using explicit context pointers to fake a closure instead of nesting functions, or using a higher-level language that has the infrastructure needed to do it properly.

回答2:

I'd like to quote something from the BDFL (Guido van Rossum):

This is because nested function definitions don't have access to the local variables of the surrounding block -- only to the globals of the containing module. This is done so that lookup of globals doesn't have to walk a chain of dictionaries -- as in C, there are just two nested scopes: locals and globals (and beyond this, built-ins). Therefore, nested functions have only a limited use. This was a deliberate decision, based upon experience with languages allowing arbitraries nesting such as Pascal and both Algols -- code with too many nested scopes is about as readable as code with too many GOTOs.

Emphasis is mine.

I believe he was referring to nested scope in Python (and as David points out in the comments, this was from 1993, and Python does support fully nested functions now) -- but I think the statement still applies.

The other part of it could have been closures.

If you have a function like this C-like code:

(*int()) foo() {
    int x = 5;
    int bar() {
        x = x + 1;
        return x;
    }
    return &bar;
}

If you use bar in a callback of some sort, what happens with x? This is well-defined in many newer, higher-level languages, but AFAIK there's no well-defined way to track that x in C -- does bar return 6 every time, or do successive calls to bar return incrementing values? That could have potentially added a whole new layer of complication to C's relatively simple definition.

回答3:

See C FAQ 20.24 and the GCC manual for potential problems:

If you try to call the nested function through its address after the containing function has exited, all hell will break loose. If you try to call it after a containing scope level has exited, and if it refers to some of the variables that are no longer in scope, you may be lucky, but it's not wise to take the risk. If, however, the nested function does not refer to anything that has gone out of scope, you should be safe.

This is not really more severe than some other problematic parts of the C standard, so I'd say the reasons are mostly historical (C99 isn't really that different from K&R C feature-wise).

There are some cases where nested functions with lexical scope might be useful (consider a recursive inner function which doesn't need extra stack space for the variables in the outer scope without the need for a static variable), but hopefully you can trust the compiler to correctly inline such functions, ie a solution with a seperate function will just be more verbose.

回答4:

Nested functions are a very delicate thing. Will you make them closures? If not, then they have no advantage to regular functions, since they can't access any local variables. If they do, then what do you do to stack-allocated variables? You have to put them somewhere else so that if you call the nested function later, the variable is still there. This means they'll take memory, so you have to allocate room for them on the heap. With no GC, this means that the programmer is now in charge of cleaning up the functions. Etc... C# does this, but they have a GC, and it's a considerably newer language than C.

回答5:

It also wouldn't be too hard to add members functions to structs but they are not in the standard either.

Features are not added to C standard based on soley whether or not they are easy to implement. It's a combination of many other factors including the point in time in which the standard was written and what was common / practical then.

回答6:

ANSI C has been established for 20 years. Perhaps between 1983 and 1989 the committee may have discussed it in the light of the state of compiler technology at the time but if they did their reasoning is lost in dim and distant past.

回答7:

One more reason: it is not at all clear that nested functions are valuable. Twenty-odd years ago I used to do large scale programming and maintenance in (VAX) Pascal. We had lots of old code that made heavy use of nested functions. At first, I thought this was way cool (compared to K&R C, which I had been working in before) and started doing it myself. After awhile, I decided it was a disaster, and stopped.

The problem was that a function could have a great many variables in scope, counting the variables of all the functions in which it was nested. (Some old code had ten levels of nesting; five was quite common, and until I changed my mind I coded a few of the latter myself.) Variables in the nesting stack could have the same names, so that "inner" function local variables could mask variables of the same name in more "outer" functions. A local variable of a function, that in C-like languages is totally private to it, could be modified by a call to a nested function. The set of possible combinations of this jazz was near infinite, and a nightmare to comprehend when reading code.

So, I started calling this programming construct "semi-global variables" instead of "nested functions", and telling other people working on the code that the only thing worse than a global variable was a semi-global variable, and please do not create any more. I would have banned it from the language, if I could. Sadly, there was no such option for the compiler...

回答8:

I disagree with Dave Vandervies.

Defining a nested function is much better coding style than defining it in global scope, making it static and adding a comment saying "This is a helper function used only by myfunc()".

What if you needed a helper function for this helper function? Would you add a comment "This is a helper function for the first helper function used only by myfunc"? Where do you take the names from needed for all those functions without polluting the namespace completely?

How confusing can code be written?

But of course, there is the problem with how to deal with closuring, i.e. returning a pointer to a function that has access to variables defined in the function from which it is returned.

回答9:

Either you don't allow references to local variables of the containing function in the contained one, and the nesting is just a scoping feature without much use, or you do. If you do, it is not a so simple feature: you have to be able to call a nested function from another one while accessing the correct data, and you also have to take into account recursive calls. That's not impossible -- techniques are well known for that and where well mastered when C was designed (Algol 60 had already the feature). But it complicates the run-time organization and the compiler and prevent a simple mapping to assembly language (a function pointer must carry on information about that; well there are alternatives such as the one gcc use). It was out of scope for the system implementation language C was designed to be.