How to generate assembly code with gcc that can be

2020-08-23 01:23发布

问题:

I am trying to learn assembly language as a hobby and I frequently use gcc -S to produce assembly output. This is pretty much straightforward, but I fail to compile the assembly output. I was just curious whether this can be done at all. I tried using both standard assembly output and intel syntax using the -masm=intel. Both can't be compiled with nasm and linked with ld.

Therefore I would like to ask whether it is possible to generate assembly code, that can be then compiled.

To be more precise I used the following C code.

 >> cat csimp.c 
 int main (void){
 int i,j;
   for(i=1;i<21;i++)
     j= i + 100;
  return 0;
  }

Generated assembly with gcc -S -O0 -masm=intel csimp.c and tried to compile with nasm -f elf64 csimp.s and link with ld -m elf_x86_64 -s -o test csimp.o. The output I got from nasm reads:

csimp.s:1: error: attempt to define a local label before any non-local labels
csimp.s:1: error: parser: instruction expected
csimp.s:2: error: attempt to define a local label before any non-local labels
csimp.s:2: error: parser: instruction expected

This is most probably due to broken assembly syntax. My hope is that I would be able to fix this without having to manually correct the output of gcc -S


Edit:

I was given a hint that my problem is solved in another question; unfortunately, after testing the method described there, I was not able to produce nasm assembly format. You can see the output of objconv below. Therefore I still need your help.

>>cat csimp.asm 
; Disassembly of file: csimp.o
; Sat Jan 30 20:17:39 2016
; Mode: 64 bits
; Syntax: YASM/NASM
; Instruction set: 8086, x64

global main:  ; **the ':' should be removed !!!** 


SECTION .text                                           ; section number 1, code

main:   ; Function begin
        push    rbp                                     ; 0000 _ 55
        mov     rbp, rsp                                ; 0001 _ 48: 89. E5
        mov     dword [rbp-4H], 1                       ; 0004 _ C7. 45, FC, 00000001
        jmp     ?_002                                   ; 000B _ EB, 0D

?_001:  mov     eax, dword [rbp-4H]                     ; 000D _ 8B. 45, FC
        add     eax, 100                                ; 0010 _ 83. C0, 64
        mov     dword [rbp-8H], eax                     ; 0013 _ 89. 45, F8
        add     dword [rbp-4H], 1                       ; 0016 _ 83. 45, FC, 01
?_002:  cmp     dword [rbp-4H], 20                      ; 001A _ 83. 7D, FC, 14
        jle     ?_001                                   ; 001E _ 7E, ED
        pop     rbp                                     ; 0020 _ 5D
        ret                                             ; 0021 _ C3
; main End of function


SECTION .data                                           ; section number 2, data


SECTION .bss                                            ; section number 3, bss

Apparent solution:

I made a mistake when cleaning up the output of objconv. I should have run:

sed -i "s/align=1//g ; s/[a-z]*execute//g ; s/: *function//g;  /default *rel/d" csimp.asm

All steps can be condensed in a bash script

#! /bin/bash

a=$( echo $1 | sed  "s/\.c//" ) # strip the file extension .c

# compile binary with minimal information
gcc -fno-asynchronous-unwind-tables -s -c ${a}.c 

# convert the executable to nasm format
./objconv/objconv -fnasm ${a}.o 

# remove unnecesairy objconv information
sed -i "s/align=1//g ; s/[a-z]*execute//g ; s/: *function//g;  /default *rel/d" ${a}.asm

# run nasm for 64-bit binary

nasm -f elf64 ${a}.asm 

# link --> see comment of MichaelPetch below
ld -m elf_x86_64 -s ${a}.o 

Running this code I get the ld warning:

 ld: warning: cannot find entry symbol _start; defaulting to 0000000000400080 

The executable produced in this manner crashes with segmentation fault message. I would appreciate your help.

回答1:

The difficulty I think you hit with the entry point error was attempting to use ld on an object file containing the entry point named main while ld was looking for an entry point named _start.

There are a couple of considerations. First, if you are linking with the C library for the use of functions like printf, linking will expect main as the entry point, but if you are not linking with the C library, ld will expect _start. Your script is very close, but you will need some way to differentiate which entry point you need to fully automate the process for any source file.

For example, the following is a conversion using your approach of a source file including printf. It was converted to nasm using objconv as follows:

Generate the object file:

gcc -fno-asynchronous-unwind-tables -s -c struct_offsetof.c -o s3.obj

Convert with objconv to nasm format assembly file

objconv -fnasm s3.obj

(note: my version of objconv added DOS line endings -- probably an option missed, I just ran it through dos2unix)

Using a slightly modified version of your sed call, tweak the contents:

sed -i -e 's/align=1//g' -e 's/[a-z]*execute//g' -e \
's/: *function//g' -e '/default *rel/d' s3.asm

(note: if no standard library functions, and using ld, change main to _start by adding the following expressions to your sed call)

-e 's/^main/_start/' -e 's/[ ]main[ ]*.*$/ _start/'

(there are probably more elegant expressions for this, this was just for example)

Compile with nasm (replacing original object file):

nasm -felf64 -o s3.obj s3.asm

Using gcc for link:

gcc -o s3 s3.obj

Test

$ ./s3

 sizeof test : 40

 myint  : 0  0
 mychar : 4  4
 myptr  : 8  8
 myarr  : 16  16
 myuint : 32  32


回答2:

You basically can't, at least directly. GCC does output assembly in Intel syntax; but NASM/MASM/TASM have their own Intel syntax. They are largely based on it, but there are as well some differences the assembler may not be able to understand and thus fail to compile.

The closest thing is probably having objdump show the assembly in Intel format:

objdump -d $file -M intel

Peter Cordes suggests in the comments that assembler directives will still target GAS, so they won't be recognized by NASM for example. They typically have the same name, but GAS-like directives start with a . as in .section text (vs section text).



回答3:

There are many different assembly languages - for each CPU there's possibly multiple possible syntaxes (e.g. "Intel syntax", "AT&T syntax"), then completely different directives, pre-processor, etc on top of that. It adds up to about 30 different dialects of assembly language for 32-bit 80x86 alone.

GCC is only able to generate one dialect of assembly language for 32-bit 80x86. This means it can't work with NASM, FASM, MASM, TASM, A86/A386, etc. It only works for GAS (and possibly YASM in its "AT&T mode" maybe).

Of course you can compile code with 3 different compilers into 3 different types of assembly, then write 3 more different pieces of code (in 3 more different types of assembly) yourself; then assemble all of that (each with their appropriate assembler) into object files and link all the object files together.



回答4:

Not enough to post a comment, but following David C. Rankin's answer above results in a relocation error and suggestion to compile with -fPIC for me. simp.c:

#include <stdio.h> 

int main (void){
 int i,j;
   for(i=1;i<21;i++){ 
     j= i + 100;
     printf("got int: %d\n",j); 
   }
 return(0);
}

Then I run the following:

rm *.obj *.o *.asm 
gcc -fno-asynchronous-unwind-tables -s -c simp.c -o simp.obj
objconv -fnasm simp.obj
dos2unix simp.asm 
sed -i -e 's/align=1//g' -e 's/[a-z]*execute//g' -e 's/: *function//g' -e '/default *rel/d' simp.asm 
nasm -felf64 -o simp2.obj simp.asm
gcc -o my_simp simp2.obj

And get the following error:

/usr/bin/ld: simp2.obj: relocation R_X86_64_PC32 against symbol `printf@@GLIBC_2.2.5' can not be used when making a PIE object; recompile with -fPIC
/usr/bin/ld: final link failed: nonrepresentable section on output
collect2: error: ld returned 1 exit status

Note: I tried using the -fPIC to the object compilation and it does add an extern _GLOBAL_OFFSET_TABLE_ entry into the generated nasm from objconv but it doesn't appear to actually be using it.