Hi I'm a beginner in C and Linking, I was reading a book that has a question in linking with static library:
Let a and b denote object modules or static libraries in the current directory, and
let a→b denote that a depends on b, in the sense that b defines a symbol that is
referenced by a. For each of the following scenarios, show the minimal command
line (i.e., one with the least number of object file and library arguments) that will
allow the static linker to resolve all symbol references:
p.o → libx.a → liby.a and liby.a → libx.a →p.o
and the answer given by the book is:
gcc p.o libx.a liby.a libx.a
I'm confused, shouldn't the answer be :
gcc p.o libx.a liby.a libx.a p.o
otherwise how the undefined symbol in libx.a
resolved by p.o
?
In case your C textbook does not make it clear, the linkage
behaviour that the author is attempting to illustrate with this
exercise is not mandated by the C Standard and is in fact behaviour
of the GNU binutils
linker ld
- the default system linker in Linux,
usually invoked on your behalf by gcc|g++|gfortran
, etc - and possibly
but not necessarily the behaviour of other linkers you might encounter.
If you've given us the exercise accurately, the author may be someone who does not understand static
linking quite as well as would be best for writing textbooks about it, or perhaps just doesn't
express themselves with great care.
Unless we are linking a program, the linker by default will not
even insist on resolving all symbol references. So presumably we're
linking a program (not a shared library), and if the answer:
gcc p.o libx.a liby.a libx.a
is actually what the text-book says, then a program is what it has to be.
But a program must have a main
function. Where is the main
function
and what are its linkage relationships to p.o
, libx.a
and liby.a
? This
matters and we're not told.
So let's assume that p
stands for program, and that the main function is at
least defined in p.o
. Weird though it would be for liby.a
to depend
on p.o
where p.o
is the main object module of the program, it would be even
weirder for the main
function to be defined in a member of a static library.
Assuming that much, here are some source files:
p.c
#include <stdio.h>
extern void x(void);
void p(void)
{
puts(__func__);
}
int main(void)
{
x();
return 0;
}
x.c
#include <stdio.h>
void x(void)
{
puts(__func__);
}
y.c
#include <stdio.h>
void y(void)
{
puts(__func__);
}
callx.c
extern void x(void);
void callx(void)
{
x();
}
cally.c
extern void y(void);
void cally(void)
{
y();
}
callp.c
extern void p(void);
void callp(void)
{
p();
}
Compile them all to object files:
$ gcc -Wall -Wextra -c p.c x.c y.c callx.c cally.c callp.c
And make static libraries libx.a
and liby.a
:
$ ar rcs libx.a x.o cally.o callp.o
$ ar rcs liby.a y.o callx.o
Now, p.o
, libx.a
and liby.a
fulfil the conditions of the exercise:
p.o → libx.a → liby.a and liby.a → libx.a →p.o
Because:
p.o
refers to but does not define x
, which is
defined in libx.a
.
libx.a
defines cally
, which refers to but does not define y
,
which is defined in liby.a
liby.a
defines callx
, which refers to but does not define x
,
which is defined in libx.a
.
libx.a
defines callp
, which refers to but does not define p
,
which is defined in p.o
.
We can confirm with nm
:
$ nm p.o
0000000000000000 r __func__.2252
U _GLOBAL_OFFSET_TABLE_
0000000000000013 T main
0000000000000000 T p
U puts
U x
p.o
defines p
( = T p
) and references x
( = U x
)
$ nm libx.a
x.o:
0000000000000000 r __func__.2250
U _GLOBAL_OFFSET_TABLE_
U puts
0000000000000000 T x
cally.o:
0000000000000000 T cally
U _GLOBAL_OFFSET_TABLE_
U y
callp.o:
0000000000000000 T callp
U _GLOBAL_OFFSET_TABLE_
U p
libx.a
defines x
( = T x
) and references y
( = U y
) and
references p
( = U p
)
$ nm liby.a
y.o:
0000000000000000 r __func__.2250
U _GLOBAL_OFFSET_TABLE_
U puts
0000000000000000 T y
callx.o:
0000000000000000 T callx
U _GLOBAL_OFFSET_TABLE_
U x
liby.a
defines y
( = T y
) and references x
( = U x
)
Now the textbook's linkage certainly succeeds:
$ gcc p.o libx.a liby.a libx.a
$ ./a.out
x
But is it the shortest possible linkage? No. This is:
$ gcc p.o libx.a
$ ./a.out
x
Why? Lets rerun the linkage with diagnostics to show which of our object
files were actually linked:
$ gcc p.o libx.a -Wl,-trace
/usr/bin/ld: mode elf_x86_64
/usr/lib/gcc/x86_64-linux-gnu/7/../../../x86_64-linux-gnu/Scrt1.o
/usr/lib/gcc/x86_64-linux-gnu/7/../../../x86_64-linux-gnu/crti.o
/usr/lib/gcc/x86_64-linux-gnu/7/crtbeginS.o
p.o
(libx.a)x.o
libgcc_s.so.1 (/usr/lib/gcc/x86_64-linux-gnu/7/libgcc_s.so.1)
/lib/x86_64-linux-gnu/libc.so.6
(/usr/lib/x86_64-linux-gnu/libc_nonshared.a)elf-init.oS
/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2
/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2
libgcc_s.so.1 (/usr/lib/gcc/x86_64-linux-gnu/7/libgcc_s.so.1)
/usr/lib/gcc/x86_64-linux-gnu/7/crtendS.o
/usr/lib/gcc/x86_64-linux-gnu/7/../../../x86_64-linux-gnu/crtn.o
They were:
p.o
(libx.a)x.o
p.o
was first linked into the program because an input .o
file is
always linked, unconditionally.
Then came libx.a
. Read
static-libaries
to understand how the linker handled it. After linking p.o
, it had
only one unresolved reference - the reference to x
. It inspected libx.a
looking for an
object file that defines x
. It found (libx.a)x.o
. It extracted x.o
from libx.a
and linked it, and then it was done.1
All of the dependency relationships involving liby.a
:-
(libx.a)cally.o
depends on (liby.a)y.o
(liby.a)callx.o
depends on (libx.a)x.o
are irrelevant to the linkage, because the linkage does not need any
of the object files in liby.a
.
Given what the author says is the right answer, we can reverse engineer the
exercise that they were striving to state. This is it:
An object module p.o
that defines main
refers to a symbol x
that it
does not define, and x
is defined in member x.o
of a static library libxz.a
(libxz.a)x.o
refers to a symbol y
that it does not define, and y
is defined in member y.o
of a static library liby.a
(liby.a)y.o
refers to a symbol z
that it does not define, and z
is defined in member z.o
of libxz.a
.
(liby.a)y.o
refers to a symbol p
that it does not define, and p
is defined in p.o
What is the minimal linkage command using p.o
, libxz.a
, liby.a
that will succeed?
New source files:
p.c
Stays as before.
x.c
#include <stdio.h>
extern void y();
void cally(void)
{
y();
}
void x(void)
{
puts(__func__);
}
y.c
#include <stdio.h>
extern void z(void);
extern void p(void);
void callz(void)
{
z();
}
void callp(void)
{
p();
}
void y(void)
{
puts(__func__);
}
z.c
#include <stdio.h>
void z(void)
{
puts(__func__);
}
New static libraries:
$ ar rcs libxz.a x.o z.o
$ ar rcs liby.a y.o
Now the linkage:
$ gcc p.o libxz.a
libxz.a(x.o): In function `cally':
x.c:(.text+0xa): undefined reference to `y'
collect2: error: ld returned 1 exit status
fails, as does:
$ gcc p.o libxz.a liby.a
liby.a(y.o): In function `callz':
y.c:(.text+0x5): undefined reference to `z'
collect2: error: ld returned 1 exit status
and:
$ gcc p.o liby.a libxz.a
libxz.a(x.o): In function `cally':
x.c:(.text+0xa): undefined reference to `y'
collect2: error: ld returned 1 exit status
and (your own pick):
$ gcc p.o liby.a libxz.a p.o
p.o: In function `p':
p.c:(.text+0x0): multiple definition of `p'
p.o:p.c:(.text+0x0): first defined here
p.o: In function `main':
p.c:(.text+0x13): multiple definition of `main'
p.o:p.c:(.text+0x13): first defined here
libxz.a(x.o): In function `cally':
x.c:(.text+0xa): undefined reference to `y'
collect2: error: ld returned 1 exit status
fails with both undefined-reference errors and multiple-definition errors.
But the textbook answer:
$ gcc p.o libxz.a liby.a libxz.a
$ ./a.out
x
is now right.
The author was attempting to describe a mutual dependency between two
static libraries in the linkage of a program, but fumbled the fact that such a mutual dependency
can only exist when the the linkage needs at least one object file from each library that
refers to some symbol that is defined by an object file in the other library.
The lessons to be learned from the corrected exercise are:
An object file foo.o
that appears in the linker inputs never needs to appear
more than once, because it will be linked unconditionally, and when it is
linked the definition that it provides of any symbol s
will serve to resolve
all references to s
that accrue for any other linker inputs. If foo.o
is
input twice you can only get errors for multiple-definition of s
.
But where there is a mutual dependency between static libraries in the linkage
it can be resolved by inputting one of the libraries twice. Because an object file
is extracted from a static library and linked if and only if that object file is
needed to define an unresolved symbol reference that the linker is seeking to define
at the point when the library is input. So in the corrected example:
p.o
is input and unconditionally linked.
x
becomes an unresolved reference.
libxz.a
is input.
- A definition of
x
is found in (libxz.a)x.o
.
(libxz.a)x.o
is extracted and linked.
x
is resolved.
- But
(libxz.a)x.o
refers to y
.
y
becomes an unresolved reference.
liby.a
is input.
- A definition of
y
is found in (liby.a)y.o
.
(liby.a)y.o
is extracted and linked.
y
is resolved.
- But
(liby.a)y.o
refers to z
.
z
becomes an unresolved reference.
libxz.a
is input again.
- A definition of
z
is found in libxz.a(z.o)
libxz.a(z.o)
is extracted and linked.
z
is resolved.
[1] As the
-trace
output shows, strictly speaking the linkage was not
done until all the boilerplate following
(libx.a)x.o
was also linked,
but it's the same boilerplate for every C program linkage.