not clear with the job of the linker

2019-05-02 08:44发布

问题:

I'm using C language on windows. This question was previously part of What happens to identifiers in a program? . I broke it to reduce no. of questions. This is a standalone query (doesn't depend on the previous question)

If there is nothing to link (i.e.. I'm not using any libraries. I know it wont be of any use.) will the linker change the object code output of assembler? If so what does it change?

I heard that LINKER also does the operation of some memory mapping. I don't understand how. The program is not running, its just in the manufacturing stage. How could linker map to memory? How would it look like? What all are the functions of LINKER?

When people refer to "relocation" , "address binding". I don't really get what they mean. What is it & what is its purpose?

Some debuggers show info like : call stack: 0xfffef32 , 0xf3234fe etc.. Its at the run time right? or is the the memory addresses of so called "memory mapping" of linker?

when people refer to something like symbols or symbol table. Do they mean identifiers(variable names, constant names, function names)?

I searched info on internet but couldn't find anything useful. May be I'm not sure what to search for. I don't want to read big books on this. But if there are any articles, tutorials which clear concepts. That would also be helpful.

I'm a novice programmer. So, it would be great you can explain in simple but technical terms.

回答1:

When you compile a source file, it is usually divided up by the compiler/assembler into several sections. As a hypothetical example imagine that the following sections are used:

  • .text - contains all the executable code
  • .const - contains constant data
  • .data - contains read/write initialized data
  • .bss - contains read/write uninitialized data

In a single source file, the compiler/assembler allocates the appropriate stuff to the appropriate sections and gives the symbols that are used offsets in the section starting from zero.

For example:

int i;
const j = 3;
int k = 4;
int l;
int main()
{
return 1;
}

This could result in the following symbol table:

Symbol Section Offset
i      .bss    0
j      .const  0
k      .data   0
l      .bss    4
main   .text   0

In the object file, in addition to the symbol table, the data in each section could be kept. In this example, the .text section would contain the object code for "return 1", the const section would contain 3, the data section would contain 4. The .bss section would not need to be in the object file, because the variables haven't been initialized.

The first thing a linker might do is to concatenate all the sections of the input object file and adjust the symbol offsets accordingly.

Now we get to what called "relocation" or "address binding". Let's say that in a hypothetical system, executable code starts at address 0x1000. Let's also say that the data sections of a program want to start at an even page boundary after the executable code. The linker would assign 0x1000 as the base of the concatenated .text sections and adjust all the symbols. Then the base of the .const, .data, and .bss sections similarly to place them in appropriate places in memory.

Sometimes there are symbolic references in a section. These references have to be updated by the linker to reflect the final position of the symbol referred to. The object file could contain "relocation records" that look like

section offset symbol
.text   0x1234 foo

The linker will go to each offset in each section and update the value there to reflect the final symbol value.

After all this is done, the resulting "absolute" object file can be loaded into memory (at the proper spot, of course!) and executed.



回答2:

I'll work with C for this discussion.

It's rare for a C program to not refer to at least some library functions; so even if your code is in just one module (file) there will usually be references to library functions. In the compiled form of your program, those references are in an external references table, i.e. a table in which the textual names appear along with the locations in your program that want to refer to those external addresses.

The linker's job is to concatenate your program in a single file with any other modules it uses, and then to match external definitions in one module with external references in another, i.e. patching all cross-references with the app such that calls hit the correct addresses.

Even if you don't reference any external modules, the link will probably need to convert some relative references in your code into absolute ones; i.e. once it "knows" where in the file your code is going to be sitting, it can assign the correct final addresses to things.



回答3:

Not an answer, just a suggestion: buy "Linkers and Loaders", read it a few times. It's amazingly helpful.



回答4:

If there is nothing to link (i.e.. I'm not using any libraries. I know it wont be of any use.) will the linker change the object code output of assembler? If so what does it change?

It always links some initialization code. You can try this, write an empty program and link it, and then use objdump -d to disassemble it.

I heard that LINKER also does the operation of some memory mapping. I don't understand how. The program is not running, its just in the manufacturing stage. How could linker map to memory? How would it look like? What all are the functions of LINKER?

Each system has a memory layout that executable programs must follow to work. It specifies where the different parts of the program go (at least code, initialized data, data initialized to zero). The linker must produce the executable according to these rules, which vary between systems, e.g. Windows and Linux. On embedded systems it gets even more interesting, there the program is typically in read-only memory (Flash) and the data is in RAM, and there are fixed address ranges for the different kinds of memory depending on the type of microcontroller.

When people refer to "relocation" , "address binding". I don't really get what they mean. What is it & what is its purpose?

Binding in general means giving a value to a name, in this case an address to the symbol for a function or global variable.

As for relocation, you typically link together more than one object file, and each object file specifies its addresses as offsets relative to its beginning. When you put them together each gets its own address range, and the linker computes the address for a symbol by mapping the offset into the address range. This is called relocation.

Some debuggers show info like : call stack: 0xfffef32 , 0xf3234fe etc.. Its at the run time right? or is the the memory addresses of so called "memory mapping" of linker?

That 0xfffef32 would be a typical address on the stack, as the stack usually is put at the top of the memory and grows downwards. The stack is used for return addresses, local variables and actual function parameters. These are local and stored at addresses relative to the stack pointer, so they aren't typically handled by the linker, rather the compiler already knows the offsets to use and puts them in the assembly code.

when people refer to something like symbols or symbol table. Do they mean identifiers(variable names, constant names, function names)?

The symbol table is a table which maps symbols to values (numbers, offsets, addresses). There are some symbols for your identifiers, but also more for other uses. Your identifiers may be modified more or less to become symbols, mostly to prevent name clashes (e.g. the prepending of "_").

The linker has an option --print-map to print the symbol table. You can use -Wl,--print-map if you use gcc for linking.

If you like this kind of low-level technical stuff you should take a look at embedded programming, i.e. programming microcontrollers which are used in various electric devices. For desktop systems like Windows you don't normally need to look at this kind of details.