I just read this excellent article: http://neugierig.org/software/chromium/notes/2011/08/static-initializers.html
and then I tried: https://gcc.gnu.org/onlinedocs/gccint/Initialization.html
What it says about finding initializers does not work for me though. The .ctors
section is not available, but I could find .init_array
(see also Can't find .dtors and .ctors in binary). But how do I interpret the output? I mean, summing up the size of the pages can also be handled by the size
command and its .bss
column - or am I missing something?
Furthermore, nm
does not report any *_GLOBAL__I_*
symbols, only *_GLOBAL__N_*
functions, and - more interesting - _GLOBAL__sub_I_somefile.cpp
entries. The latter probably indicates files with global initialization. But can I somehow get a list of constructors that are being run? Ideally, a tool would give me a list of
Foo::Foo in file1.cpp:12
Bar::Bar in file2.cpp:45
...
(assuming I have debug symbols available). Is there such a tool? If not, how could one write it? Does the .init_array
section contain pointers to code which could be translated via some DWARF magic to the above?
As you already observed, the implementation details of contructors/initialization functions are highly compiler (version) dependent. While I am not aware of a tool for this, what current GCC/clang versions do is simple enough to let a small script do the job: .init_array
is just a list of entry points. objdump -s
can be used to load the list, and nm
to lookup the symbol names. Here's a Python script that does that. It should work for any binary that was generated by the said compilers:
#!/usr/bin/env python
import os
import sys
# Load .init_array section
objdump_output = os.popen("objdump -s '%s' -j .init_array" % (sys.argv[1].replace("'", r"\'"),)).read()
is_64bit = "x86-64" in objdump_output
init_array = objdump_output[objdump_output.find("Contents of section .init_array:") + 33:]
initializers = []
for line in init_array.split("\n"):
parts = line.split()
if not parts:
continue
parts.pop(0) # Remove offset
parts.pop(-1) # Remove ascii representation
if is_64bit:
# 64bit pointers are 8 bytes long
parts = [ "".join(parts[i:i+2]) for i in range(0, len(parts), 2) ]
# Fix endianess
parts = [ "".join(reversed([ x[i:i+2] for i in range(0, len(x), 2) ])) for x in parts ]
initializers += parts
# Load disassembly for c++ constructors
dis_output = os.popen("objdump -d '%s' | c++filt" % (sys.argv[1].replace("'", r"\'"), )).read()
def find_associated_constructor(disassembly, symbol):
# Find associated __static_initialization function
loc = disassembly.find("<%s>" % symbol)
if loc < 0:
return False
loc = disassembly.find(" <", loc)
if loc < 0:
return False
symbol = disassembly[loc+2:disassembly.find("\n", loc)][:-1]
if symbol[:23] != "__static_initialization":
return False
address = disassembly[disassembly.rfind(" ", 0, loc)+1:loc]
loc = disassembly.find("%s <%s>" % (address, symbol))
if loc < 0:
return False
# Find all callq's in that function
end_of_function = disassembly.find("\n\n", loc)
symbols = []
while loc < end_of_function:
loc = disassembly.find("callq", loc)
if loc < 0 or loc > end_of_function:
break
loc = disassembly.find("<", loc)
symbols.append(disassembly[loc+1:disassembly.find("\n", loc)][:-1])
return symbols
# Load symbol names, if available
nm_output = os.popen("nm '%s'" % (sys.argv[1].replace("'", r"\'"), )).read()
nm_symbols = {}
for line in nm_output.split("\n"):
parts = line.split()
if not parts:
continue
nm_symbols[parts[0]] = parts[-1]
# Output a list of initializers
print("Initializers:")
for initializer in initializers:
symbol = nm_symbols[initializer] if initializer in nm_symbols else "???"
constructor = find_associated_constructor(dis_output, symbol)
if constructor:
for function in constructor:
print("%s %s -> %s" % (initializer, symbol, function))
else:
print("%s %s" % (initializer, symbol))
C++ static initializers are not called directly, but through two generated functions, _GLOBAL__sub_I_..
and __static_initialization..
. The script uses the disassembly of those functions to get the name of the actual constructor. You'll need the c++filt
tool to unmangle the names, or remove the call from the script to see the raw symbol name.
Shared libraries can have their own initializer lists, which would not be displayed by this script. The situation is slightly more complicated there: For non-static initializers, the .init_array
gets an all-zero entry that is overwritten with the final address of the initializer when loading the library. So this script would output an address with all zeros.
There are multiple things executed when loading an ELF object, not just .init_array
. To get an overview, I suggest looking at the sources of libc's loader, especially _dl_init()
and call_init()
.