When I test the GCC inline-assembly, I use the test
function to display a character on the screen with the BOCHS emulator. This code is running in 32-bit protected mode. The code is as follows:
test() {
char ch = 'B';
__asm__ ("mov $0x10, %%ax\n\t"
"mov %%ax, %%es\n\t"
"movl $0xb8000, %%ebx\n\t"
"mov $0x04, %%ah\n\t"
"mov %0, %%al\n\t"
"mov %%ax, %%es: ((80 * 3 + 40) * 2)(%%ebx)\n\t"
::"r"(ch):);
}
The red character on the screen isn't displaying B
correctly. However, when I changed the input register r
to c
like this: ::"c"(ch):);
, which is the last line of the above code, the character 'B' displays normally:
What's the difference? I accessed the video memory through the data segment directly after the computer entered into protected mode.
I have trace the assembly code, I have found that the code has been assembled to mov al, al
when the r
register is chosen and the value of ax
is 0x0010
, so al
is 0x10
. The result should be like this, but why did it choose the al
register. Isn't it supposed to choose the register which hasn't been used before? When I add the clobbers
list, I have solved the problem.
Like @MichaelPetch commented, you can use 32bit addresses to access whatever memory you want from C. The asm gcc emits will assume a flat memory space, and assume that it can copy
esp
toedi
and userep stos
to zero some stack memory, for example (this requires that%es
has the same base as%ss
).I'd guess that the best solution is not to use any inline asm, but instead just use a global constant as a pointer to
char
. e.g.From gcc6.1 on godbolt (link below), with
-O3 -m32
.Without the
const
, code likevga_base[10] = 0x4 << 8 | 'A';
would have to load thevga_base
global and then offset from it. With theconst
,&vga_base[10]
is a compile-time constant.If you really want a segment:
Since you can't leave
%es
modified, you need to save/restore it. This is another reason to avoid using it in the first place. If you really want a special segment for something, set up%fs
or%gs
once and leave them set, so it doesn't affect the normal operation of any instructions that don't use a segment override.There is builtin syntax to use
%fs
or%gs
without inline asm, for thread-local variables. You might be able to take advantage of it to avoid inline asm altogetherIf you're using a custom segment, you could make it's base address non-zero, so you don't need to add a
0xb8000
yourself. However, Intel CPUs optimize for flat memory case, so address-generation using non-zero segment bases are a couple cycles slower, IIRC.I did find a request for gcc to allow segment overrides without inline asm, and a question about adding segment support to gcc. Currently you can't do that.
Doing it manually in asm, with a dedicated segment
To look at the asm output, I put it on Godbolt with the
-mx32
ABI, so args are passed in registers, but addresses don't need to be sign-extended to 64bits. (I wanted to avoid the noise of loading args from the stack for-m32
code. The-m32
asm for protected mode will look similar)So this code gets gcc to do an excellent job at using an addressing mode to do the address math, and do as much as possible at compile time.
Segment register
If you do want to modify a segment register for every store, keep in mind that it's slow: Agner Fog's insn tables stop including
mov sr, r
after Nehalem, but on Nehalem it's a 6 uop instruction that includes 3 load uops (from the GDT I assume). It has a throughput of one per 13 cycles. Reading a segment register is fine (e.g.push sr
ormov r, sr
).pop sr
is even a bit slower.I'm not even going to write code for this, because it's such a bad idea. Make sure you use clobber constraints to let the compiler know about every register you step on, or you will have hard-to-debug errors where surrounding code stops working.
See the x86 tag wiki for GNU C inline asm info.