Is C++ linkage smart enough to avoid linkage of un

2019-03-22 10:10发布

问题:

I'm far from fully understanding how the C++ linker works and I have a specific question about it.

Say I have the following:

Utils.h

namespace Utils
{
    void func1();
    void func2();
}

Utils.cpp

#include "some_huge_lib" // needed only by func2()

namespace Utils
{
    void func1() { /* do something */ }
    void func2() { /* make use of some functions defined in some_huge_lib */ }
}

main.cpp

int main()
{
  Utils::func1()
}

My goal is to generate as smaller binary files as possible.

My question is, will some_huge_lib be included in the output object file?

回答1:

Including or linking against large libraries usually won't make a difference unless you use that stuff. Linkers should perform dead code elimination and thus ensure that at build time you won't be getting large binaries with a lot of unused code (read your compiler/linker manual to find out more, this isn't enforced by the C++ standard).

Including lots of headers won't increase your binary size either (but it might substantially increase your compilation time, cfr. precompiled headers). Some exceptions stand for global objects and dynamic libraries (those can't be stripped). I also recommend to read this passage (gcc only) regarding separating code into multiple sections.

One last notice about performances: if you use a lot of position dependent code (i.e. code that can't just map to any address with relative offsets but needs some 'hotpatching' via a relocation or similar table) then there will be a startup cost.



回答2:

This depends a lot on what tools and switches you use in order to link and compile.

Firstly, if link some_huge_lib as a shared library, all the code and dependencies will need to be resolved on linking the shared library. So yes, it'll get pulled in somewhere.

If you link some_huge_lib as an archive, then - it depends. It is good practice for the sanity of the reader to put func1 and func2 in separate source code files, in which case in general the linker will be able to disregard the unused object files and their dependencies.

If however you have both functions in the same file, you will, on some compilers, need to tell them to produce individual sections for each function. Some compilers do this automatically, some don't do it at all. If you don't have this option, pulling in func1 will pull in all the code for func2, and all the dependencies will need to be resolved.



回答3:

Think of each function as a node in a graph.
Each node is associated with a piece of binary code - the compiled binary of the node's function.
There is a link (directed edge) between 2 nodes if one node (function) depends on (calls) another.

A static library is primarily a list of such nodes (+ an index).

The program starting-node is the main() function.
The linker traverses the graph from main() and links into the executable all the nodes that are reachable from main(). That's why it is called a linker (the linking maps the function call addresses within the executable).

Unused functions, do not have links from nodes in the graph emanating from main().
Thus, such disconnected nodes are not reachable and are not included in the final executable.

The executable (as opposed to the static library) is primarily a list of all nodes reachable from main() (+ an index and startup code among other things).



回答4:

In addition to other replies, it must be said that normally linkers work in terms of sections, not functions.

Compilers typically have it configurable whether they put all of your object code into one monolithic section or split it into a number of smaller ones. For example, GCC options to switch on splitting are -ffunction-sections (for code) and -fdata-sections (for data); MSVC option is /Gy (for both). -fnofunction-sections, -fnodata-sections, /Gy- respectively to put all code or data into one section.

You might 'play' with compiling your modules in both modes and then dumping them (objdump for GCC, dumpbin for MSVC) to see the generated object file structure.

Once a section is formed by the compiler, for the linker it is a unit. Sections define symbols and refer to symbols defined in other sections. The linker will build dependency graph between the sections (starting at a number of roots) and then either disband or keep each of them entirely. So, if you have a used and an unused function in a section, the unused function will be kept.

There are both benefits and drawbacks in either mode. Turning splitting on means smaller executable files, but larger object files and longer linking times.

It has to also be noted that in C++, unlike C, there are certain situations where the One Definition Rule is relaxed, and multiple definitions of a function or data object are allowed (for example, in case of inline functions). The rules are formulated in such way that the linker is allowed to pick any definition.

From the point of view of sections, putting inline functions together with non-inline ones would mean that in a typical use scenario the linker would typically be forced to keep virtually every definition of every inline function; that would mean excessive code bloat. Therefore, such functions and data are normally put into their own sections regardless of compiler command line options.

UPDATE: As @janm correctly reminded in his comment, the linker must also be instructed to get rid of unreferenced sections by specifying --gc-sections (GNU) or /opt:ref (MS).



标签: c++ linker