How to find global static initializations

2019-02-14 17:09发布

问题:

I just read this excellent article: http://neugierig.org/software/chromium/notes/2011/08/static-initializers.html and then I tried: https://gcc.gnu.org/onlinedocs/gccint/Initialization.html

What it says about finding initializers does not work for me though. The .ctors section is not available, but I could find .init_array (see also Can't find .dtors and .ctors in binary). But how do I interpret the output? I mean, summing up the size of the pages can also be handled by the size command and its .bss column - or am I missing something?

Furthermore, nm does not report any *_GLOBAL__I_* symbols, only *_GLOBAL__N_* functions, and - more interesting - _GLOBAL__sub_I_somefile.cpp entries. The latter probably indicates files with global initialization. But can I somehow get a list of constructors that are being run? Ideally, a tool would give me a list of

Foo::Foo in file1.cpp:12
Bar::Bar in file2.cpp:45
...

(assuming I have debug symbols available). Is there such a tool? If not, how could one write it? Does the .init_array section contain pointers to code which could be translated via some DWARF magic to the above?

回答1:

As you already observed, the implementation details of contructors/initialization functions are highly compiler (version) dependent. While I am not aware of a tool for this, what current GCC/clang versions do is simple enough to let a small script do the job: .init_array is just a list of entry points. objdump -s can be used to load the list, and nm to lookup the symbol names. Here's a Python script that does that. It should work for any binary that was generated by the said compilers:

#!/usr/bin/env python
import os
import sys

# Load .init_array section
objdump_output = os.popen("objdump -s '%s' -j .init_array" % (sys.argv[1].replace("'", r"\'"),)).read()
is_64bit = "x86-64" in objdump_output
init_array = objdump_output[objdump_output.find("Contents of section .init_array:") + 33:]
initializers = []
for line in init_array.split("\n"):
    parts = line.split()
    if not parts:
        continue
    parts.pop(0)  # Remove offset
    parts.pop(-1) # Remove ascii representation

    if is_64bit:
        # 64bit pointers are 8 bytes long
        parts = [ "".join(parts[i:i+2]) for i in range(0, len(parts), 2) ]

    # Fix endianess
    parts = [ "".join(reversed([ x[i:i+2] for i in range(0, len(x), 2) ])) for x in parts ]

    initializers += parts

# Load disassembly for c++ constructors
dis_output = os.popen("objdump -d '%s' | c++filt" % (sys.argv[1].replace("'", r"\'"), )).read()
def find_associated_constructor(disassembly, symbol):
    # Find associated __static_initialization function
    loc = disassembly.find("<%s>" % symbol)
    if loc < 0:
        return False
    loc = disassembly.find(" <", loc)
    if loc < 0:
        return False
    symbol = disassembly[loc+2:disassembly.find("\n", loc)][:-1]
    if symbol[:23] != "__static_initialization":
        return False
    address = disassembly[disassembly.rfind(" ", 0, loc)+1:loc]
    loc = disassembly.find("%s <%s>" % (address, symbol))
    if loc < 0:
        return False
    # Find all callq's in that function
    end_of_function = disassembly.find("\n\n", loc)
    symbols = []
    while loc < end_of_function:
        loc = disassembly.find("callq", loc)
        if loc < 0 or loc > end_of_function:
            break
        loc = disassembly.find("<", loc)
        symbols.append(disassembly[loc+1:disassembly.find("\n", loc)][:-1])
    return symbols

# Load symbol names, if available
nm_output = os.popen("nm '%s'" % (sys.argv[1].replace("'", r"\'"), )).read()
nm_symbols = {}
for line in nm_output.split("\n"):
    parts = line.split()
    if not parts:
        continue
    nm_symbols[parts[0]] = parts[-1]

# Output a list of initializers
print("Initializers:")
for initializer in initializers:
    symbol = nm_symbols[initializer] if initializer in nm_symbols else "???"
    constructor = find_associated_constructor(dis_output, symbol)
    if constructor:
        for function in constructor:
            print("%s %s -> %s" % (initializer, symbol, function))
    else:
        print("%s %s" % (initializer, symbol))

C++ static initializers are not called directly, but through two generated functions, _GLOBAL__sub_I_.. and __static_initialization... The script uses the disassembly of those functions to get the name of the actual constructor. You'll need the c++filt tool to unmangle the names, or remove the call from the script to see the raw symbol name.

Shared libraries can have their own initializer lists, which would not be displayed by this script. The situation is slightly more complicated there: For non-static initializers, the .init_array gets an all-zero entry that is overwritten with the final address of the initializer when loading the library. So this script would output an address with all zeros.



回答2:

There are multiple things executed when loading an ELF object, not just .init_array. To get an overview, I suggest looking at the sources of libc's loader, especially _dl_init() and call_init().