Why can't the linker prevent the C++ static in

2019-04-07 23:18发布

站内文章 / C++

4 0

乱世女痞

女 | 书童

私信

可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

EDIT: Changed example below to one that actually demonstrates the SIOF.

I am trying to understand all of the subtleties of this problem, because it seems to me to be a major hole in the language. I have read that it cannot be prevented by the linker, but why is this so? It seems trivial to prevent in simple cases, like this:

// A.h
extern int x;

// A.cpp
#include <cstdlib>

int x = rand();

// B.cpp
#include "A.h"
#include <iostream>

int y = x;

int main()
{
    std::cout << y; // prints the random value (or garbage)?
}

Here, the linker should be able to easily determine that the initialization code for A.cpp should happen before B.cpp in the linked executable, because B.cpp depends on a symbol defined in A.cpp (and the linker obviously already has to resolve this reference).

So why can't this be generalized to all compilation units. If the linker detects a circular dependency, can't it just fail the link with an error (or perhaps a warning, since it may be the programmer's intent I suppose to define a global symbol in one compilation unit, and initialize it in another)?

Does the standard levy any requirements on an implementation to ensure the proper initialization order in simple cases? What is an example of a case where this would not be possible?

I understand that an analogous situation can occur at global destruction time. If the programmer does not carefully ensure that the dependencies during destruction are symmetrical to construction, a similar problem occurs. Could the linker not warn about this scenario as well?

回答1:

Linkers traditionally just link - i.e. they resolve addresses. You seem to be wanting them to do semantic analysis of the code. But they don't have access to semantic information - only a bunch of object code. Modern linkers at least can handle large symbol names and discard duplicate symbols to make templates more useable, but so long as linkers and compilers are independent, that's about it. Of course if both linker and compiler are developed by the same team, and if that team is a big corporation, more intelligence can be put in the linker, but it's hard to see how a standard for a portable language can mandate such a thing.

If you want to know more about linkers, BTW, take a look at http://www.iecc.com/linker/ - about the only book on an often ignored tool.

回答2:

In theory, there's nothing preventing a linker from handling this -- basically do a topological sort among the dependencies to come up with an initialization order. Existing linkers don't do it though, and C++ mostly depends on existing linkers...

Edit: From the viewpoint of the standard, the solution to this problem is utterly trivial: one sentence to require that all objects with static storage duration are initialized prior to main() beginning execution. Unfortunately, about all that would accomplish is raising another area in which virtually nobody conforms with the standard, or (worse) even has a plan to do so. For it to mean anything, the implementers on the committee have to agree that it's sufficiently important that they're going to implement it.

You're right that it's easy to look around and see that people have problems with this. At the same time, I don't know of a single vendor who seems to consider it a real problem. None of them seems to have worked on it yet. None of them has it scheduled for a future release. As far as I can see, it hasn't even made it onto anybody's "it would be nice if we could someday" list.

That brings us back to what I originally said: even though it may look like a serious problem to us as users, it apparently doesn't look that way to most implementers. I can see a number of reasons that might be so. First, of course, is that C++ isn't a key item in anybody's corporate agenda. Microsoft pushes .NET. Sun/Oracle and IBM push Java. Others have their own agendas, but none of them is trying to get you to use C++. It looks to me like most of them consider it a necessary evil, not something to which they really want to devote any effort at all. That being the case, working on completely re-designing the guts of their linker to handle this particular problem would probably only even be open to consideration if they got a lot of complaints about it. That as two problems. First of all, C++ starts out as a fairly small community, so it would take a huge percentage of them before implementers really noticed anything they said. Second, only a fairly small percentage of C++ programmers really run into problems with this anyway. About the only reason they'd bother or care would be if it became an issue for their own, internal development. Unfortunately, most have little reason to care about portability.

回答3:

It's because static initialization is a completely different animal than runtime initialization. The initialization of x is—by its nature in your example—dynamic. But it is written as a static initialization. This comes mostly from compatibility with decades of C practice.

One way of resolving such a construct is to compiling initialization code for each module which runs before main(), like #pragma startup does in some implementations.

But really, how often does the declaration module not know what the initialization values are?

回答4:

In your simple example, a sufficiently smart linker could indeed work out that the initializations in A.o need to run before those in B.o because B.o refers to symbols that are defined in A.o.

But examples as simple as yours don't really demonstrate much of a problem, certainly not something of the "fiasco" level. Here's a slightly more complicated example.

// externs.h
extern int a;
extern int b;

// A.cpp
#include "externs.h"

int a = 5;
int aa = b;

// B.cpp
#include "externs.h"
int b = 10;
int bb = a;

The standard requires that variables in a single compilation unit be initialized in declaration order, so a must be initialized before aa, and b be initialized before bb, but there aren't any further ordering requirements. Initializations from a compilation unit are allowed to be interleaved with those from other compilation units.

There is at least one initialization order that would ensure all variables are initialized before they get used to initialize anything else, while still obeying the standard:

a
b
bb
aa

The linker has only limited information about this program. It knows that the compiled file A.o defines two symbols, a and aa, and that it refers to an external symbol b. Likewise, it knows that B.o defines b and bb and refers to external symbol a. The two object files are mutually dependent, so the linker cannot use the same technique it could have used from your example. In this example, it needs to know that only a has to be defined in order to initialize B.o. The information recorded in the object files, though, doesn't get that specific. It doesn't contain dependencies between symbols.

回答5:

Traditional linkers are not looking at source code or even ASTs, and existing object file formats provide fairly minimal information about exported and external symbols.

回答6:

While the linker could perhaps do that, most examples of where you would need it to do so are also examples of bad code lacking cohesion and having high coupling (usually through the horror of global variables). Your example being such an exemplar.

So it is hardly a "fiasco"; that is probably too strong a description. It is merely a minor restriction of the way you might code.

回答7:

Any language standard is a compromise among many things. In this case, we're talking about a compromise between ease of implementation and ease of use. If a language is too hard to implement, there will be few or no conforming implementations, and the standard will be useless. If it's too hard to use, nobody will use it, and the standard also will be useless.

Language standard committees will therefore try to limit the demands they place on the implementation, particularly on the more common systems. In modern systems, it's very common to have various different compilers but a shared linker, and therefore a committee will feel much freer to make demands on the compiler writers but go easier on the linkers.

C++ function overloading depended on finding a trick to make it work on linkers ("name mangling"). The C90 standard said that variable names with external linkage had to be unique in the first six characters without counting different cases. The rationale (to the 1989 ANSI version, it was, IIRC, dropped for the 1990 ISO standard) said that the committee was very unhappy about keeping that restriction, but felt that dropping it would make it too difficult to implement standard C on too many systems with primitive linkers.

There is something of a chicken-and-egg situation here, in that language designers are reluctant to put demands on linkers, and therefore there's no great push for linkers to evolve, but that's the way things are currently working.