I have a question regarding how to initialize an array in assembly. I tried:
.bss
#the array
unsigned: .skip 10000
.data
#these are the values that I want to put in the array
par4: .quad 500
par5: .quad 10
par6: .quad 15
That's how I declared my string and the variables that I want to put it inside. This is how I tried to put them into the array:
movq $0 , %r8
movq par4 , %rax
movq %rax , unsigned(%r8)
incq %r8
movq par5 , %rax
movq %rax , unsigned(%r8)
incq %r8
movq par6 , %rax
movq %rax , unsigned(%r8)
I tried printing the elements to check if everything is okay, and only the last one prints okay, the other two have some weird values.
Maybe this is not the way I should declare and work with it?
First of all,
unsigned
is the name of a type in C, so it's a poor choice for an array. Let's call itarr
instead.You want to treat that block of space in the BSS as an array qword elements. So each element is 8 bytes. So you need to store to
arr+0
,arr+8
, andarr+16
. (The total size of your array is 10000 bytes, which is 10000/8 qwords).But you're using
%r8
as a byte offset, not a scaled-index. That's generally a good thing, all else equal; indexed addressing modes are slower in some cases on some CPUs. But the problem is you only increment it by1
withinc
, not withadd $8, %r8
.So you're actually storing to
arr+0
,arr+1
, andarr+2
, with 8-byte stores that overlap each other, leaving just the least-significant byte of the last store. x86 is little-endian so the resulting contents of memory is effectively this, followed by the rest of the unwritten bytes that stay zero.You could of course just use
.qword
in the.data
section to declare a static array with the contents you want. But with only the first 3 element non-zero, putting it in the BSS makes sense for one that large, instead of a having the OS page in the zeros from disk.If you're going to fully unroll instead of using a loop over your 3-element qword array starting at
par4
, you don't need to increment a register at all. You also don't need the initializers to be in data memory, you can just use immediates because they all fit as 32-bit sign-extended.You can run that under GDB and examine memory to see what you get. (Compile it with
gcc -nostdlib -static foo.s
). In GDB, start the program withstarti
(to stop at the entry point), then single-step withsi
. Usex /4g &arr
to dump the contents of memory atarr
as an array of 4 qwords.Or if you did want to use a register, might as well just loop a pointer instead of an index.
Or scaled-index:
Fun trick: you could just do a byte store instead of a qword store to set the low byte, and leave the rest zero. This would save code-size but if you did a qword load right away, you'd get a store-forwarding stall. (~10 cycles extra latency for the store/reload to merge data from the cache with the store from the store buffer)
Or if you did still want to copy 24 bytes from
par4
in.rodata
, you could use SSE. x86-64 guarantees that SSE2 is available.Or with SSE4.1, you can load+expand small constants so you don't need a whole qword for each small number that you're going to copy into the BSS array.