how __libc_start_main@plt works?

2020-07-13 07:43发布

问题:

To study how the object file loaded and run in linux, I made the simplest c code, file name simple.c.

int main(){}

Next, I make object file and save object file as text file.

$gcc ./simple.c 
$objdump -xD ./a.out > simple.text

From many internet articles, I could catch that gcc dynamically load initiating functions like _start, _init, __libc_start_main@plt, and so on. So I started to read my assembly code, helped by http://dbp-consulting.com/tutorials/debugging/linuxProgramStartup.html .

Here is the some part of assembly code.

080482e0 <__libc_start_main@plt>:
 80482e0:       ff 25 10 a0 04 08       jmp    *0x804a010
 80482e6:       68 08 00 00 00          push   $0x8
 80482eb:       e9 d0 ff ff ff          jmp    80482c0 <_init+0x2c>

Disassembly of section .text:

080482f0 <_start>:
 80482f0:       31 ed                   xor    %ebp,%ebp
 80482f2:       5e                      pop    %esi
 80482f3:       89 e1                   mov    %esp,%ecx
 80482f5:       83 e4 f0                and    $0xfffffff0,%esp
 80482f8:       50                      push   %eax
 80482f9:       54                      push   %esp
 80482fa:       52                      push   %edx
 80482fb:       68 70 84 04 08          push   $0x8048470
 8048300:       68 00 84 04 08          push   $0x8048400
 8048305:       51                      push   %ecx
 8048306:       56                      push   %esi
 8048307:       68 ed 83 04 08          push   $0x80483ed
 804830c:       e8 cf ff ff ff          call   80482e0 <__libc_start_main@plt>
 8048311:       f4                      hlt
 8048312:       66 90                   xchg   %ax,%ax
 8048314:       66 90                   xchg   %ax,%ax
 8048316:       66 90                   xchg   %ax,%ax
 8048318:       66 90                   xchg   %ax,%ax
 804831a:       66 90                   xchg   %ax,%ax
 804831c:       66 90                   xchg   %ax,%ax
 804831e:       66 90                   xchg   %ax,%ax

080483ed <main>:
 80483ed:       55                      push   %ebp
 80483ee:       89 e5                   mov    %esp,%ebp
 80483f0:       b8 00 00 00 00          mov    $0x0,%eax
 80483f5:       5d                      pop    %ebp
 80483f6:       c3                      ret
 80483f7:       66 90                   xchg   %ax,%ax
 80483f9:       66 90                   xchg   %ax,%ax
 80483fb:       66 90                   xchg   %ax,%ax
 80483fd:       66 90                   xchg   %ax,%ax
 80483ff:       90                      nop



 ...

Disassembly of section .got:

08049ffc <.got>:
 8049ffc:       00 00                   add    %al,(%eax)
        ...

Disassembly of section .got.plt:

0804a000 <_GLOBAL_OFFSET_TABLE_>:
 804a000:       14 9f                   adc    $0x9f,%al
 804a002:       04 08                   add    $0x8,%al
        ...
 804a00c:       d6                      (bad)
 804a00d:       82                      (bad)
 804a00e:       04 08                   add    $0x8,%al
 804a010:       e6 82                   out    %al,$0x82
 804a012:       04 08                   add    $0x8,%al

My question is;

In 0x804830c, 0x80482e0 is called (I've already apprehended the previous instructions.).

In 0x80482e0, the process jump to 0x804a010.

In 0x804a010, the instruction is < out %al,$0x82 >

...wait. just out? What was in the %al and where is 0x82?? I got stuck in this line.

Please help....

*p.s. I'm beginner to linux and operating system. I'm studying operating system concepts by school class, but still can not find how to study proper linux assembly language. I've already downloaded intel processor manual but it is too huge to read. Can anyone inform me good material for me? Thanks.

回答1:

80482e0:       ff 25 10 a0 04 08       jmp    *0x804a010

This means "retrieve the 4-byte address stored at 0x804a010 and jump to it."

804a010:       e6 82                   out    %al,$0x82
804a012:       04 08                   add    $0x8,%al

Those 4 bytes will be treated as an address, 0x80482e6, not as instructions.

80482e0:       ff 25 10 a0 04 08       jmp    *0x804a010
80482e6:       68 08 00 00 00          push   $0x8
80482eb:       e9 d0 ff ff ff          jmp    80482c0 <_init+0x2c>

So we've just executed an instruction that has moved us exactly one instruction forward. At this point, you're probably wondering if there's a good reason for this.

There is. This is a typical PLT/GOT implementation. Much more detail, including a diagram, is at Position Independent Code in shared libraries: The Procedure Linkage Table.

The real code for __libc_start_main is in a shared library, glibc. The compiler and compile-time linker don't know where the code will be at run-time, so they place in your compiled program a short __libc_start_main function which contains just three instructions:

  • jump to a location specified by the 4th (or 5th, depending on whether you like to count from 0 or 1) entry in the GOT
  • push $8 onto the stack
  • jump to a resolver routine

The first time you call __libc_start_main, the resolver code will run. It will find the actual location of __libc_start_main in a shared library and will patch the 4th entry of the GOT to be that address. If your program calls __libc_start_main again, the jmp *0x804a010 instruction will take the program directly to the code in the shared library.

Can anyone inform me good material for me?

The x86 Assembly book at Wikibooks might be one place to start.