I have a question regarding how to initialize an array in assembly. I tried:
.bss
#the array
unsigned: .skip 10000
.data
#these are the values that I want to put in the array
par4: .quad 500
par5: .quad 10
par6: .quad 15
That's how I declared my string and the variables that I want to put it inside.
This is how I tried to put them into the array:
movq $0 , %r8
movq par4 , %rax
movq %rax , unsigned(%r8)
incq %r8
movq par5 , %rax
movq %rax , unsigned(%r8)
incq %r8
movq par6 , %rax
movq %rax , unsigned(%r8)
I tried printing the elements to check if everything is okay, and only the last one prints okay, the other two have some weird values.
Maybe this is not the way I should declare and work with it?
First of all, unsigned
is the name of a type in C, so it's a poor choice for an array. Let's call it arr
instead.
You want to treat that block of space in the BSS as an array qword elements. So each element is 8 bytes. So you need to store to arr+0
, arr+8
, and arr+16
. (The total size of your array is 10000 bytes, which is 10000/8 qwords).
But you're using %r8
as a byte offset, not a scaled-index. That's generally a good thing, all else equal; indexed addressing modes are slower in some cases on some CPUs. But the problem is you only increment it by 1
with inc
, not with add $8, %r8
.
So you're actually storing to arr+0
, arr+1
, and arr+2
, with 8-byte stores that overlap each other, leaving just the least-significant byte of the last store. x86 is little-endian so the resulting contents of memory is effectively this, followed by the rest of the unwritten bytes that stay zero.
# static array that matches what you actually stored
arr: .byte 500 & 0xFF, 10, 15, 0, 0, 0, 0, 0, 0, 0, ...
You could of course just use .qword
in the .data
section to declare a static array with the contents you want. But with only the first 3 element non-zero, putting it in the BSS makes sense for one that large, instead of a having the OS page in the zeros from disk.
If you're going to fully unroll instead of using a loop over your 3-element qword array starting at par4
, you don't need to increment a register at all. You also don't need the initializers to be in data memory, you can just use immediates because they all fit as 32-bit sign-extended.
# these are assemble-time constants, not associated with a section
.equ par4, 500
.equ par5, 10
.equ par6, 15
.text # already the default section but whatever
.globl _start
_start:
movq $par4, arr(%rip) # use RIP-relative addressing when there's no register
movq $par5, arr+8(%rip)
movq $par6, arr+16(%rip)
mov $60, %eax
syscall # Linux exit(0)
.bss
arr: .skip 10000
You can run that under GDB and examine memory to see what you get. (Compile it with gcc -nostdlib -static foo.s
). In GDB, start the program with starti
(to stop at the entry point), then single-step with si
. Use x /4g &arr
to dump the contents of memory at arr
as an array of 4 qwords.
Or if you did want to use a register, might as well just loop a pointer instead of an index.
lea arr(%rip), %rdi # or mov $arr, %edi in a non-PIE executable
movq $par4, (%rdi)
add $8, %rdi # advance the pointer 8 bytes = 1 element
movq $par5, (%rdi)
add $8, %rdi
movq $par6, (%rdi)
Or scaled-index:
## Scaled-index addressing
movq $par4, arr(%rip)
mov $1, %eax
movq $par5, arr(,%rax,8) # [arr + rax*8]
inc %eax
movq $par6, arr(,%rax,8)
Fun trick: you could just do a byte store instead of a qword store to set the low byte, and leave the rest zero. This would save code-size but if you did a qword load right away, you'd get a store-forwarding stall. (~10 cycles extra latency for the store/reload to merge data from the cache with the store from the store buffer)
Or if you did still want to copy 24 bytes from par4
in .rodata
, you could use SSE. x86-64 guarantees that SSE2 is available.
movaps par4(%rip), %xmm0
movaps %xmm0, arr(%rip) # copy par4 and par5
mov par6(%rip), %rax # aka par4+16
mov %rax, arr+16(%rip)
.section .rodata # read-only data.
.p2align 4 # align by 2^4 = 16 for movaps
par4: .quad 500
par5: .quad 10
par6: .quad 15
.bss
.p2align 4 # align by 16 for movaps
arr: .skip 10000
# or use .lcomm arr, 10000 without even switching to .bss
Or with SSE4.1, you can load+expand small constants so you don't need a whole qword for each small number that you're going to copy into the BSS array.
movzxwq initializers(%rip), %xmm0 # zero-extend 2 words into 2 qwords
movaps %xmm0, arr(%rip)
movzwl initializers+4(%rip), %eax # zero-extending word load
mov %rax, arr+16(%rip)
.section .rodata
initializers: .word 500, 10, 15