I am trying to learn assembly language as a hobby and I frequently use gcc -S
to produce assembly output. This is pretty much straightforward, but I fail to compile the assembly output. I was just curious whether this can be done at all. I tried using both standard assembly output and intel syntax using the -masm=intel
. Both can't be compiled with nasm
and linked with ld
.
Therefore I would like to ask whether it is possible to generate assembly code, that can be then compiled.
To be more precise I used the following C code.
>> cat csimp.c
int main (void){
int i,j;
for(i=1;i<21;i++)
j= i + 100;
return 0;
}
Generated assembly with gcc -S -O0 -masm=intel csimp.c
and tried to compile with nasm -f elf64 csimp.s
and link with ld -m elf_x86_64 -s -o test csimp.o
. The output I got from nasm reads:
csimp.s:1: error: attempt to define a local label before any non-local labels
csimp.s:1: error: parser: instruction expected
csimp.s:2: error: attempt to define a local label before any non-local labels
csimp.s:2: error: parser: instruction expected
This is most probably due to broken assembly syntax. My hope is that I would be able to fix this without having to manually correct the output of gcc -S
Edit:
I was given a hint that my problem is solved in another question; unfortunately, after testing the method described there, I was not able to produce nasm
assembly format. You can see the output of objconv
below.
Therefore I still need your help.
>>cat csimp.asm
; Disassembly of file: csimp.o
; Sat Jan 30 20:17:39 2016
; Mode: 64 bits
; Syntax: YASM/NASM
; Instruction set: 8086, x64
global main: ; **the ':' should be removed !!!**
SECTION .text ; section number 1, code
main: ; Function begin
push rbp ; 0000 _ 55
mov rbp, rsp ; 0001 _ 48: 89. E5
mov dword [rbp-4H], 1 ; 0004 _ C7. 45, FC, 00000001
jmp ?_002 ; 000B _ EB, 0D
?_001: mov eax, dword [rbp-4H] ; 000D _ 8B. 45, FC
add eax, 100 ; 0010 _ 83. C0, 64
mov dword [rbp-8H], eax ; 0013 _ 89. 45, F8
add dword [rbp-4H], 1 ; 0016 _ 83. 45, FC, 01
?_002: cmp dword [rbp-4H], 20 ; 001A _ 83. 7D, FC, 14
jle ?_001 ; 001E _ 7E, ED
pop rbp ; 0020 _ 5D
ret ; 0021 _ C3
; main End of function
SECTION .data ; section number 2, data
SECTION .bss ; section number 3, bss
Apparent solution:
I made a mistake when cleaning up the output of objconv
. I should have run:
sed -i "s/align=1//g ; s/[a-z]*execute//g ; s/: *function//g; /default *rel/d" csimp.asm
All steps can be condensed in a bash
script
#! /bin/bash
a=$( echo $1 | sed "s/\.c//" ) # strip the file extension .c
# compile binary with minimal information
gcc -fno-asynchronous-unwind-tables -s -c ${a}.c
# convert the executable to nasm format
./objconv/objconv -fnasm ${a}.o
# remove unnecesairy objconv information
sed -i "s/align=1//g ; s/[a-z]*execute//g ; s/: *function//g; /default *rel/d" ${a}.asm
# run nasm for 64-bit binary
nasm -f elf64 ${a}.asm
# link --> see comment of MichaelPetch below
ld -m elf_x86_64 -s ${a}.o
Running this code I get the ld
warning:
ld: warning: cannot find entry symbol _start; defaulting to 0000000000400080
The executable produced in this manner crashes with segmentation fault message. I would appreciate your help.
There are many different assembly languages - for each CPU there's possibly multiple possible syntaxes (e.g. "Intel syntax", "AT&T syntax"), then completely different directives, pre-processor, etc on top of that. It adds up to about 30 different dialects of assembly language for 32-bit 80x86 alone.
GCC is only able to generate one dialect of assembly language for 32-bit 80x86. This means it can't work with NASM, FASM, MASM, TASM, A86/A386, etc. It only works for GAS (and possibly YASM in its "AT&T mode" maybe).
Of course you can compile code with 3 different compilers into 3 different types of assembly, then write 3 more different pieces of code (in 3 more different types of assembly) yourself; then assemble all of that (each with their appropriate assembler) into object files and link all the object files together.
You basically can't, at least directly. GCC does output assembly in Intel syntax; but NASM/MASM/TASM have their own Intel syntax. They are largely based on it, but there are as well some differences the assembler may not be able to understand and thus fail to compile.
The closest thing is probably having
objdump
show the assembly in Intel format:Peter Cordes suggests in the comments that assembler directives will still target GAS, so they won't be recognized by NASM for example. They typically have the same name, but GAS-like directives start with a
.
as in.section text
(vssection text
).The difficulty I think you hit with the entry point error was attempting to use
ld
on an object file containing the entry point namedmain
whileld
was looking for an entry point named_start
.There are a couple of considerations. First, if you are linking with the C library for the use of functions like
printf
, linking will expectmain
as the entry point, but if you are not linking with the C library,ld
will expect_start
. Your script is very close, but you will need some way to differentiate which entry point you need to fully automate the process for any source file.For example, the following is a conversion using your approach of a source file including
printf
. It was converted tonasm
usingobjconv
as follows:Generate the object file:
Convert with objconv to nasm format assembly file
(note: my version of
objconv
added DOS line endings -- probably an option missed, I just ran it throughdos2unix
)Using a slightly modified version of your
sed
call, tweak the contents:(note: if no standard library functions, and using
ld
, changemain
to_start
by adding the following expressions to yoursed
call)(there are probably more elegant expressions for this, this was just for example)
Compile with
nasm
(replacing original object file):Using
gcc
for link:Test
Not enough to post a comment, but following David C. Rankin's answer above results in a relocation error and suggestion to compile with -fPIC for me. simp.c:
Then I run the following:
And get the following error:
Note: I tried using the -fPIC to the object compilation and it does add an
extern _GLOBAL_OFFSET_TABLE_
entry into the generated nasm fromobjconv
but it doesn't appear to actually be using it.