Replacing ld with gold - any experience?

2019-01-06 11:12发布

问题:

Has anyone tried to use gold instead of ld?

gold promises to be much faster than ld, so it may help speeding up test cycles for large C++ applications, but can it be used as drop-in replacement for ld?

Can gcc/g++ directly call gold.?

Are there any know bugs or problems?

Although gold is part of the GNU binutils since a while, I have found almost no "success stories" or even "Howtos" in the Web.

(Update: added links to gold and blog entry explaining it)

回答1:

At the moment it is compiling bigger projects on Ubuntu 10.04. Here you can install and integrate it easily with the binutils-gold package (if you remove that package, you get your old ld). Gcc will automatically use gold then.

Some experiences:

  • gold doesn't search in /usr/local/lib
  • gold doesn't assume libs like pthread or rt, had to add them by hand
  • it is faster and needs less memory (the later is important on big C++ projects with a lot of boost etc.)

What does not work: It cannot compile kernel stuff and therefore no kernel modules. Ubuntu does this automatically via DKMS if it updates proprietary drivers like fglrx. This fails with ld-gold (you have to remove gold, restart DKMS, reinstall ld-gold.



回答2:

As it took me a little while to find out how to selectively use gold (i.e. not system-wide using a symlink), I'll post the solution here. It's based on http://code.google.com/p/chromium/wiki/LinuxFasterBuilds#Linking_using_gold .

  1. Make a directory where you can put a gold glue script. I am using ~/bin/gold/.
  2. Put the following glue script there and name it ~/bin/gold/ld:

    #!/bin/bash
    gold "$@"
    

    Obviously, make it executable, chmod a+x ~/bin/gold/ld.

  3. Change your calls to gcc to gcc -B$HOME/bin/gold which makes gcc look in the given directory for helper programs like ld and thus uses the glue script instead of the system-default ld.



回答3:

Can gcc/g++ directly call gold.?

Just to complement the answers: there is a gcc's option -fuse-ld=gold (see gcc doc). Though, AFAIK, it is possible to configure gcc during the build in a way that the option will not have any effect.



回答4:

You could link ld to gold (in a local binary directory if you have ld installed to avoid overwriting):

ln -s `which gold` ~/bin/ld

or

ln -s `which gold` /usr/local/bin/ld


回答5:

As a Samba developer, I have been using the gold linker almost exclusively on Ubuntu, Debian, and Fedora since several years now. My assessment:

  • gold is many times (felt: 5-10 times) faster than the classical linker.
  • Initially, there were a few problems, but the have gone since roughly Ubuntu 12.04.
  • The gold linker even found some dependency problems in our code, since it seems to be more correct than the classical one with respect to some details. See, e.g. this Samba commit.

I have not used gold selectively, but have been using symlinks or the alternatives mechanism if the distribution provides it.



回答6:

Some projects seem to be incompatible with gold, because of some incompatible differences between ld and gold. Example: OpenFOAM, see http://www.openfoam.org/mantisbt/view.php?id=685 .



回答7:

DragonFlyBSD switched over to gold as their default linker. So it seems to be ready for a variety of tools.
More details: http://phoronix.com/scan.php?page=news_item&px=DragonFlyBSD-Gold-Linker



回答8:

Minimal synthetic benchmark

Outcome: gold about 2x to 3x faster for all values I've tried.

generate-objects

#!/usr/bin/env bash
set -eu

# CLI args.

# Each of those files contains n_ints_per_file ints.
n_int_file_is="${1:-10}"
n_ints_per_file="${2:-10}"

# Each function adds all ints from all files.
# This leads to n_int_file_is x n_ints_per_file x n_funcs relocations.
n_funcs="${3:-10}"

# Do a debug build, since it is for debug builds that link time matters the most,
# as the user will be recompiling often.
cflags='-ggdb3 -O0 -std=c99 -Wall -Wextra -pedantic'

# Cleanup previous generated files objects.
./clean

# Generate i_*.c, ints.h and int_sum.h
rm -f ints.h
echo 'return' > int_sum.h
int_file_i=0
while [ "$int_file_i" -lt "$n_int_file_is" ]; do
  int_i=0
  int_file="${int_file_i}.c"
  rm -f "$int_file"
  while [ "$int_i" -lt "$n_ints_per_file" ]; do
    echo "${int_file_i} ${int_i}"
    int_sym="i_${int_file_i}_${int_i}"
    echo "unsigned int ${int_sym} = ${int_file_i};" >> "$int_file"
    echo "extern unsigned int ${int_sym};" >> ints.h
    echo "${int_sym} +" >> int_sum.h
    int_i=$((int_i + 1))
  done
  int_file_i=$((int_file_i + 1))
done
echo '1;' >> int_sum.h

# Generate funcs.h and main.c.
rm -f funcs.h
cat <<EOF >main.c
#include "funcs.h"

int main(void) {
return
EOF
i=0
while [ "$i" -lt "$n_funcs" ]; do
  func_sym="f_${i}"
  echo "${func_sym}() +" >> main.c
  echo "int ${func_sym}(void);" >> funcs.h
  cat <<EOF >"${func_sym}.c"
#include "ints.h"

int ${func_sym}(void) {
#include "int_sum.h"
}
EOF
  i=$((i + 1))
done
cat <<EOF >>main.c
1;
}
EOF

# Generate *.o
ls | grep -E '\.c$' | parallel --halt now,fail=1 -t --will-cite "gcc $cflags -c -o '{.}.o' '{}'"

GitHub upstream.

Given an input of type:

./generate-objects [n_int_file_is [n_ints_per_file [n_funcs]]]

This generates a main that does:

return f_0() + f_1() + ... + f_(n_funcs)()

where each function is defined in a separate f_n.c file, and adds n_int_file_is times n_ints_per_file extern ints:

int f_0() { return i_0_0 + i_0_1 + ... + i_(n_int_file_is)_(n_ints_per_file); }

This leads to:

n_int_file_is x n_ints_per_file x n_funcs

relocations on the link.

Then I compared:

gcc -ggdb3 -O0 -std=c99 -Wall -Wextra -pedantic               -o main *.o
gcc -ggdb3 -O0 -std=c99 -Wall -Wextra -pedantic -fuse-ld=gold -o main *.o

for various input triplets, which gave:

10000 10 10
nogold: wall=3.70s user=2.93s system=0.75s max_mem=556356kB
gold:   wall=1.43s user=1.15s system=0.28s max_mem=703060kB

1000 100 10
nogold: wall=1.23s user=1.07s system=0.16s max_mem=188152kB
gold:   wall=0.60s user=0.52s system=0.07s max_mem=279108kB

100 1000 10
nogold: wall=0.96s user=0.87s system=0.08s max_mem=149636kB
gold:   wall=0.53s user=0.47s system=0.05s max_mem=231596kB

10000 10 100
nogold: wall=11.63s user=10.31s system=1.25s max_mem=1411264kB
gold:   wall=6.31s user=5.77s system=0.53s max_mem=2146992kB

1000 100 100
nogold: wall=7.19s user=6.56s system=0.60s max_mem=1058432kB
gold:   wall=4.15s user=3.81s system=0.34s max_mem=1697796kB

100 1000 100
nogold: wall=6.15s user=5.58s system=0.57s max_mem=1031372kB
gold:   wall=4.06s user=3.76s system=0.29s max_mem=1652548kB

Some limits I've been trying to mitigate:

  • at 100k C files, both methods get failed mallocs occasionally
  • GCC cannot compile a function with 1M additions

Tested on Ubuntu 18.10, GCC 8.2.0, Lenovo ThinkPad P51 laptop, Intel Core i7-7820HQ CPU (4 cores / 8 threads), 2x Samsung M471A2K43BB1-CRC RAM (2x 16GiB), Samsung MZVLB512HAJQ-000L7 SSD (3,000 MB/s).

I have also observed a 2x in the debug build of gem5: https://gem5.googlesource.com/public/gem5/+/fafe4e80b76e93e3d0d05797904c19928587f5b5