len: equ 2
len: db 2
Are they the same, producing a label that can be used instead of 2
? If not, then what is the advantage or disadvantage of each declaration form? Can they be used interchangeably?
len: equ 2
len: db 2
Are they the same, producing a label that can be used instead of 2
? If not, then what is the advantage or disadvantage of each declaration form? Can they be used interchangeably?
equ: preprocessor time. analogous to #define but most assemblers are lacking an #undef, and can't have anything but an atomic constant of fixed number of bytes on the right hand side, so floats, doubles, lists are not supported with most assemblers' equ directive.
db: compile time. the value stored in db is stored in the binary output by the assembler at a specific offset. equ allows you define constants that normally would need to be either hardcoded, or require a mov operation to get. db allows you to have data available in memory before the program even starts.
Here's a nasm demonstrating db:
An equ can only define a constant up to the largest the assembler supports
example of equ, along with a few common limitations of it.
the resulting binary has no bytes at all because equ does not pollute the image; all references to an equ get replaced by the right hand side of that equ.
Summary
NASM 2.10.09 ELF output:
db
does not have any magic effects: it simply outputs bytes directly to the output object file.If those bytes happen to be in front of a symbol, the symbol will point to that value when the program starts.
If you are on the text section, your bytes will get executed.
Weather you use
db
ordw
, etc. that does not specify the size of the symbol: thest_size
field of the symbol table entry is not affected.equ
makes the symbol in the current line havest_shndx == SHN_ABS
magic value in its symbol table entry.Instead of outputting a byte to the current object file location, it outputs it to the
st_value
field of the symbol table entry.All else follows from this.
To understand what that really means, you should first understand the basics of the ELF standard and relocation.
SHN_ABS theory
SHN_ABS
tells the linker that:st_value
field of the symbol entry is to be used as a value directlyContrast this with "regular" symbols, in which the value of the symbol is a memory address instead, and must therefore go through relocation.
Since it does not point to memory,
SHN_ABS
symbols can be effectively removed from the executable by the linker by inlining them.But they are still regular symbols on object files and do take up memory there, and could be shared amongst multiple files if global.
Sample usage
Note that since the symbol
x
contains a literal value, no dereference[]
must be done to it like fory
.If we wanted to use
x
from a C program, we'd need something like:and set on the asm:
Empirical observation of generated output
We can observe what we've said before with:
Now:
contains:
Ndx
isst_shndx
, so we see thatx
isSHN_ABS
whiley
is not.Also see that
Size
is0
fory
:db
in no way toldy
that it was a single byte wide. We could simply add twodb
directives to allocate 2 bytes there.And then:
gives:
So we see that
0x1
was inlined into instruction, whiley
got the value of a relocation address0x8049088
.Tested on Ubuntu 14.04 AMD64.
Docs
http://www.nasm.us/doc/nasmdoc3.html#section-3.2.4:
See also
Analogous question for GAS: Difference between .equ and .word in ARM Assembly?
.equiv
seems to be the closes GAS equivalent.The first is
equate
, similar to C's:in that it doesn't actually allocate any space in the final code, it simply sets the
len
symbol to be equal to 2. Then, when you uselen
later on in your source code, it's the same as if you're using the constant2
.The second is
define byte
, similar to C's:It does actually allocate space, one byte in memory, stores a
2
there, and setslen
to be the address of that byte.Here's some psuedo-assembler code that shows the distinction:
Line 1 simply sets the assembly address to be
1234
, to make it easier to explain what's happening.In line 2, no code is generated, the assembler simply loads
elen
into the symbol table with the value2
. Since no code has been generated, the address does not change.Then, when you use it on line 4, it loads that value into the register.
Line 3 shows that
db
is different, it actually allocates some space (one byte) and stores the value in that space. It then loadsdlen
into the symbol table but gives it the value of that address1234
rather than the constant value2
.When you later use
dlen
on line 5, you get the address, which you would have to dereference to get the actual value2
.