I'm not trying to prompt an Intel vs AT&T war (moot point anyway, now that they both support Intel syntax) or ask which one is "better" per se, I just want to know the practical differences in choosing one or the other.
Basically, when I was picking up some basic x86 assembly a few years back, I used NASM for no reason other than the book I was reading did too -- which put me firmly but involuntarily in the NASM camp. Since then, I've had very few causes to use assembly so I haven't had the opportunity to try GAS.
Bearing in mind that they both support Intel syntax (which I personally prefer) and should, theoretically at least, produce the same binary (I know they probably won't but the meaning shouldn't be changed), what are the reasons to favour one or the other?
Is it command line options? Macros? Non-mnemonic keywords? Or something else?
Thanks :)
Intel Syntax: mov eax, 1 (instruction destination, source)
AT&T Syntax: movl $1, %eax (instruction source, destination)
The Intel syntax is pretty self explanatory. In the above example, the amount of data which is moved is inferred from the size of the register (32 bits in the case of eax). The addressing mode used is inferred from the operands themselves.
There are some quirks when it comes to the AT&T syntax. Firstly, notice the l
suffix at the end of the mov
instruction, this stands for long
and signifies 32 bits of data. Other instruction suffixes include
w
for a word (16 bits - not to be confused with the word size of your
CPU!), q
for a quad-word (64 bits) and b
for a single byte. Whilst not always required, typically you will see assembly code which uses AT&T syntax explicitly state the amount of data being operated on by the instruction.
More explicitness is required when it comes to the addressing mode used on the source and destination operand. $
signifies immediate
addressing, as in use the value in the instruction itself. In the above example, if it was written without this $
, direct
addressing would be used i.e. the CPU would try and fetch the value at memory address 1 (which will more than likely result in a segmentation fault). The %
signifies register
addressing, if you didn't include this in the above example eax
would be treated as a symbol
i.e. a labelled memory address, which would more than likely result in an undefined reference
at link time. So it is mandatory that you are explicit about the addressing mode used on both the source and destination operand.
The way memory operands are specified is also different:
Intel: [base register + index * size of index + offset]
AT&T: offset(base register, index, size of index)
The Intel syntax makes it a little more clear what calculation is taking place to find the memory address. With the AT&T syntax, the result is the same but you are expected to know the calculation taking place.
should, theoretically at least, produce the same binary
This is entirely dependent on your toolchain.
what are the reasons to favour one or the other?
Personal preference of course, in my opinion it comes down to which syntax you feel more comfortable with when addressing memory. Do you prefer the forced explicitness of the AT&T syntax? Or do you prefer your assembler figuring out this low level minutia for you?
Is it command line options? Macros? Non-mnemonic keywords?
This has to do with the assembler (GAS, NASM) itself. Again, personal preference.
NASM actually uses its own variation of Intel syntax, different from the MASM syntax used in Intel's official documentation. The opcode names and operand orders are the same as in Intel so the instructions look the same at first glance, but any significant program will have differences. For example with MASM the instruction used by MOV ax, foo
depends on the type of foo
, while NASM doesn't have types and this always assembled to a move immediate instruction. When the size of an operand can't be determined implicitly MASM requires something like DWORD PTR
to be used where NASM uses DWORD
to mean the same thing. Most of the syntax beyond the instruction mnemonics and basic operand format and ordering is different.
In terms of functionality NASM and GAS are pretty much the same. Both have assembler macro facilities, though NASM's is more extensive and more mature. Many GAS source code files use the C preprocessor instead of GAS's own macro support.
This biggest difference between the two assemblers is their support for 16-bit code. GAS doesn't have any support for defining x86 segments. With GAS you're limited to creating simple single-segment 16-bit binary images, basically just boot sectors and .COM files. NASM has full support for segments and supports OMF format object files which you can use with a suitable linker to create segmented 16-bit executables.
In addition to the OMF object file format, NASM supports a number of formats that GAS doesn't. GAS normally only supports the native format for the machine its running on, basically ELF, PE-COFF, or MACH-O. If you want to support a different format you need to build a "cross-compiling" version of GAS for that format.
Another notable difference is that GAS has support for creating DWARF and Windows 64-bit unwind information (the later required by the Windows x64 ABI) while with NASM you have create create the sections and fill in the data yourself.