I am currently working on a flexible C/C++ build framework that I'll (hopefully) open source fairly soon. (see this question for some background).
I am using the below command to generate the #include file dependencies for source/header files.
gcc -M -MM -MF
Is there a way of cleverly inferring linker (.o file) dependencies for executables (unit tests + main executable for target platform in my case) using gcc/GNU utilties in a similar way to above? Currently the framework makes a whole lot of assumptions and is pretty dumb in determining these dependencies.
I have heard of one approach where the nm command can be used to come up with a list of undefined symbols in an object file. For example, running nm on an object file (compiled using gcc -c) comes up with something like this -
nm -o module.o
module.o: U _undefinedSymbol1
module.o: U _undefinedSymbol2
module.o:0000386f T _definedSymbol
One would then look for other object files where these undefined symbols are defined to come up with a list of object file dependencies required to successfully link the file.
Is this considered best practice in determining linker dependencies for executables? Are there any other ways of inferring these dependencies? Assume that all object files already exist (i.e. have already been compiled using gcc -c) when proposing your solution.
If there are multiple executables (or even a single executable) that need different sets of dependencies, then the normal, classic way to handle that is to use a library — static .a
or shared .so
(or equivalent) — to hold the object files that can be used by more than one program, and to link the programs with that library. The linker automatically pulls the correct object files out of a static archive. The shared library process is a little different, but the net result is the same: the executable has the correct object files available to it at runtime.
For any program, there is at least one file unique to the program (normally, that's the file that contains the main()
program). There may be a few files for that program. Those files are probably known about and can be listed easily. The ones that you might need depending on configuration and compilation options are probably shared between programs and are easily handled via the library mechanism.
You have to decide whether you want to use static or shared libraries. Creating shared libraries well is harder than creating static libraries. On the other hand, you can update a shared library and immediately affect all the programs that use it, whereas a static library can be changed but only programs that are relinked with the new library benefit from the changes.
The following Python script can be used to collect and process the nm
output for all object files in the current directory:
#! /usr/bin/env python
import collections
import os
import re
import subprocess
addr_re = r"(?P<address>[0-9a-f]{1,16})?"
code_re = r"(?P<code>[a-z])"
symbol_re = r"(?P<symbol>[a-z0-9_.$]+)"
nm_line_re = re.compile(r"\s+".join([addr_re, code_re, symbol_re]) + "\s*$",
re.I)
requires = collections.defaultdict(set)
provides = collections.defaultdict(set)
def get_symbols(fname):
lines = subprocess.check_output(["nm", "-g", fname])
for l in lines.splitlines():
m = nm_line_re.match(l)
symbol = m.group('symbol')
if m.group('code') == 'U':
requires[fname].add(symbol)
else:
provides[symbol].add(fname)
for dirpath, dirnames, filenames in os.walk("."):
for f in filenames:
if f.endswith(".o"):
get_symbols(f)
def pick(symbols):
# If several files provide a symbol, choose the one with the shortest name.
best = None
for s in symbols:
if best is None or len(s) < len(best):
best = s
if len(symbols) > 1:
best = "*" + best
return best
for fname, symbols in requires.items():
dependencies = set(pick(provides[s]) for s in symbols if s in provides)
print fname + ': ' + ' '.join(sorted(dependencies))
The script searches the current directory and all subdirectories for .o
files, calls nm
for each file found and dissects the resulting output. Symbols which are undefined in one .o
file and defined in another are interpreted as a dependency between the two files. Symbols defined nowhere (typically provided by external libraries) are ignored. Finally, the script prints a list of direct dependencies for all object files.
If a symbol is provided by several object files, this script arbitrarily assumes a dependency on the object file with the shortest file name (and marks the chosen file with a *
in the output). This behaviour can be changed by modifying the function pick
.
The script works for me on Linux and MacOS, I haven't tried any other operating systems and the script is only lightly tested.
The nm utility reads object files (and archives, such as .a libraries) using libbfd. I'm thinking what you'll really want to do is process a database of the public symbols defined in the libraries you know about, and in the object files that a part of this project, so that as you generate each new object file you can look at the undefined symbols in it and determine which object -- plain or in a library -- you need to link to resolve the references. Essentially you're doing the same job as the linker, but sort of in reverse, so that you will which symbols you can locate.
If you're working with GCC, you can always look into the source packages for your 'binutils' to find the sources to nm, and even to ld if you want that. You certainly don't want to run nm and parse the output when it's just using libbfd under the hood, just call libbfd yourself.