wrong memory locations when debugging in qemu with

2020-04-21 02:58发布

问题:

I'm writing a little kernel in assembler. I'm running it in QEMU and have some problems with some bugs. Now I want to debug the kernel with dbg. So I assembled it like so:

$ nasm -g -f elf -o myos.elf myos.asm
$ objcopy --only-keep-debug myos.elf myos.sym
$ objcopy -O binary myos.elf myos.bin

Then I run it in QEMU with:

$ qemu-system-i386 -s -S myos.bin

Then I connect with gdb:

$ gdb
(gdb) target remote localhost:1234
Remote debugging using localhost:1234
0x0000fff0 in ?? ()
symbol-file myos.sym
Reading symbols from /home/sven/Projekte/myos/myos.sym...done.

I have a label named welcome in my kernel that points to a string. While testing I tried to look at that string, which gave the following result:

(gdb) x/32b welcome
0x1e <welcome>: 0x00    0xf0    0xa5    0xfe    0x00    0xf0    0x87    0xe9
0x26:   0x00    0xf0    0x6e    0xc9    0x00    0xf0    0x6e    0xc9
0x2e:   0x00    0xf0    0x6e    0xc9    0x00    0xf0    0x6e    0xc9
0x36:   0x00    0xf0    0x57    0xef    0x00    0xf0    0x6e

The label is defined like this:

welcome: db "System started. Happy hacking!", 10, 0

So you can see, gdb is pretending welcome starts with a null byte but by definition it's not. However the kernel uses the label correctly, so it doesn't seem like a poblem with my code. Examining other parts of the memory doesn't match the loaded kernel at all.

Does anyone know why the memory of the virtual machine doesn't match the loaded kernel, while the machine still behaves corectly?

回答1:

Explanation

  • qemu-system-i386 loads the first byte of an x86 boot sector image file at address 0x7c00 at run time.
  • Your ELF files (myos.elf, myos.sym) mistakenly inform GDB that the code would be loaded at address 0. Thus GDB thinks welcome is at 0x1e while it's actually at 0x7c1e.
  • Adding 0x7c00 to all addresses in GDB would work but is clumsy: x/32xb (welcome + 0x7c00)
  • A better solution is to create an ELF file with the right addresses.

Solution

boot.asm

; 'boot.asm'
; loaded by BIOS

[bits 16]

global main
main:
mov di, welcome
print_welcome:
mov ah, 0x0e
mov al, [di]
int 0x10
inc di
cmp byte [di], 0
jne print_welcome
hlt

db "XXXXXXXXXXXXXX" ; some padding to make welcome appear at 0x1e
welcome: db "System started. Happy hacking!", 10, 0

; x86 boot sector padding and signature
; NOTE: intentionally commented out. Will be added by linker script
;times 510 - ($ - $$) db 0x00
;db 0x55, 0xAA

x86-boot.ld

ENTRY(main);
SECTIONS
{
    . = 0x7C00;
    .text : AT(0x7C00)
    {
        _text = .;
        *(.text);
        _text_end = .;
    }
    .data :
    {
        _data = .;
        *(.bss);
        *(.bss*);
        *(.data);
        *(.rodata*);
        *(COMMON)
        _data_end = .;
    }
    .sig : AT(0x7DFE)
    {
        SHORT(0xaa55);
    }
    /DISCARD/ :
    {
        *(.note*);
        *(.iplt*);
        *(.igot*);
        *(.rel*);
        *(.comment);
/* add any unwanted sections spewed out by your version of gcc and flags here */
    }
}

Build the code with:

nasm -g -f elf -F dwarf boot.asm -o boot.o
cc -nostdlib -m32 -T x86-boot.ld -Wl,--build-id=none  boot.o -o boot
objcopy -O binary boot boot.good.bin

dump-welcome.gdb

target remote localhost:1234
symbol-file boot
monitor system_reset
# run until hlt instruction, address obtained through disassembly
until *0x7c0f
x/32xb welcome

monitor quit
disconnect
quit

Sample session:

$ qemu-system-x86_64 -s -S boot.good.bin &
$ gdb -q -x dump-welcome.gdb
0x0000fff0 in ?? ()
main () at boot.asm:16
16  hlt
0x7c1e :   0x53    0x79    0x73    0x74    0x65    0x6d    0x20    0x73
0x7c26: 0x74    0x61    0x72    0x74    0x65    0x64    0x2e    0x20
0x7c2e: 0x48    0x61    0x70    0x70    0x79    0x20    0x68    0x61
0x7c36: 0x63    0x6b    0x69    0x6e    0x67    0x21    0x0a    0x00

Thought Process

Most of the 32 bytes you dumped have values ≥ 0x80, i.e. they're not printable ASCII characters. This raises the question: Am I really dumping the right address?

The hex dump of your welcome message should be:

$ python -c 's = "System started. Happy hacking!"; print [hex(ord(x)) for x in s ]'
['0x53', '0x79', '0x73', '0x74', '0x65', '0x6d', '0x20', '0x73', '0x74', '0x61', '0x72', '0x74', '0x65', '0x64', '0x2e', '0x20', '0x48', '0x61', '0x70', '0x70', '0x79', '0x20', '0x68', '0x61', '0x63', '0x6b', '0x69', '0x6e', '0x67', '0x21']

Using GDB to search for the welcome message in memory would have revealed the right address as well:

(gdb) find 0, 0xffff, 'S', 'y', 's', 't'
0x7c1e

Further Reading

  • GNU LD: Basic Linker Script Concepts: see discussion on LMA vs. VMA
  • Real mode in C with gcc : writing a bootloader: source of the linker script above. Shows some cool GNU toolchain tricks for x86 real mode development.