-->

ELF program header virtual address and file offset

2020-07-17 07:39发布

问题:

I know the relationship between the two:

virtual address mod page alignment == file offset mod page alignment

But can someone tell me in which direction are these two numbers computed?

Is virtual address computed from file offset according to the relationship above, or vice versa?

Update

Here is some more detail: when the linker writes the ELF file header, it sets the virtual address and file offset of the program headers.(segments)

For example there's the output of readelf -l someELFfile:

Elf file type is EXEC (Executable file)
Entry point 0x8048094
Program Headers:
  Type           Offset   VirtAddr   PhysAddr   FileSiz MemSiz  Flg Align
  LOAD           0x000000 0x08048000 0x08048000 0x00154 0x00154 R E 0x1000
  LOAD           0x000154 0x08049154 0x08049154 0x00004 0x00004 RW  0x1000
  GNU_STACK      0x000000 0x00000000 0x00000000 0x00000 0x00000 RW  0x10

We can see 2 LOAD segments.

The virtual address of the first LOAD ends at 0x8048154, while the second LOAD starts at 0x8049154.

In the ELF file, the second LOAD is right behind the first LOAD with file offset 0x00154, however when this ELF is loaded into memory it starts at 0x1000 bytes after the end of the first LOAD segment.

But, why? If we have to consider memory page alignment, why doesn't the second LOAD segment starts at 0x80489000? Why does it start at 0x1000 bytes AFTER THE END of the first LOAD segment?

I know the virtual address of the second LOAD satisfies the relationship:

virtual address mod page alignment == file offset mod page alignment

But I don't know why this relationship must be satisfied.

回答1:

Why does it start at 0x1000 bytes AFTER THE END of the first LOAD segment?

If it didn't, it would have to start at 0x08048154, but it can't: the two LOAD segments have different flags specified for their mapping (the first is mapped with PROT_READ|PROT_EXEC, the second with PROT_READ|PROTO_WRITE. Protections (being part of the page table) can only apply to whole pages, not parts of a page. Therefore, the mappings with different protections must belong to different pages.

virtual address mod page alignment == file offset mod page alignment
But I don't know why this relationship must be satisfied.

The LOAD segments are directly mmaped from file. The actual mapping of the second LOAD segment performed for your example will look something like this (you can run your program under strace and see that it does):

mmap(0x08049000, 0x158, PROT_READ|PROT_WRITE, MAP_PRIVATE, $fd, 0)

If you try to make the virtual address or the offset non-page-aligned, mmap will fail with EINVAL. The only way to make file data to appear in virtual memory at desired address it to make VirtAddr congruent to Offset modulo Align, and that is exactly what the static linker does.

Note that for such a small first LOAD segment, the entire first segment also appears at the beginning of the second mapping (with the wrong protections). But the program is not supposed to access anything in the [0x08049000,0x08049154) range. In general, it is almost always the case that there is some "junk" before the start of actual data in the second LOAD segment (unless you get really lucky and the first LOAD segment ends on a page boundary).

See also mmap man page.