Assembly language standard

2019-01-27 21:56发布

问题:

Is there a standard that defines the syntax and semantics of assembly language? Similarly as language C has ISO standard and language C# has ECMA standard? Is there only one standard, or are there more of them?

I'm asking because I noticed that assembly language code looked different on Windows and Linux environment. I hoped that assembly language is not dependent on OS, that it's only language with some defined standard and via assembler (compiler of assembly language) is translated into machine instructions for particular processor.

thank you for answer

回答1:

The closest thing to a standard is that the vendor that created the processor/instruction set will have a document describing that language and often that vendor will provide some sort of an assembler (program). Some vendors are more detail and standard oriented than others so you get what you get. Then things like this intel/at&t happen to mess things up. Add to that gnu assembler loves to mess up the assembly language for the chips it supports as well so in general you have chaos.

If there were an assembly language whose use were comparable to C or C++ then you would expect an organization to try to come up with a standard. Part of the problem would still be that with things like the C language there is an interpretation before it hits the hardware, with assembler there is none to very little so a chip vendor is going to make whatever they want to make due to market factors and the standard would have to be dragged along to match the hardware, instead of the other way around where a standard drives the vendors.

The opencore processor might be one that could be standards driven since it is not vendor specific, perhaps it is already.

With assembly assume that each version of each assembler program/software/tool has its own syntax rules within the same instruction set as well as across different instruction sets. (which is actually what you get with C/C++ but that is another topic) either choose your favorite tool and only know it, or try to memorize all the variations across all the tools, or my preference is to try to avoid as many tool specific syntax and nuances, and try to find the middle ground that works or at least has a chance to work or port across tools.



回答2:

Yes, there is a standard.

People that built assemblers even up til the 1980s chose an incredible variety of syntax schemes.

The IEEE community reacted with a standard to try to avoid that problem:

694-1985 - IEEE Standard for Microprocessor Assembly Language

As with many things in the software world, it was and continues to be largely ignored.



回答3:

No, there is no standard. There are even two different types of syntax: the intel-syntax which is predominant on Windows plattforms and the AT&T-sytanx which is dominant in the *nix-world. Regarding the differently looking code in the wikipedia: the windows example uses the Win32API and the linux example uses a system call of the 0x80 interrupt.



回答4:

Assembly languages differ from processor to processor so no, there is no standard.

In general, the "standard" assembly language for a particular family of processor is whatever the processor designers say it is. For example, the "standard" syntax for x86 is whatever Intel says it is. However, that doesn't prevent other people from creating a variant of the assembly language that targets the processor with slightly different syntax or additional features (Nasm is one example).



回答5:

Well, I'm not sure if you are asking about syntax for x86 processors (I suppose yes, because you're mentioning NASM).

But there are two common standards:

  • Intel syntax that was originally used for documentation of the x86 platform
  • AT&T syntax which is common in Linux/Unix worlds.

NASM you have mentioned prefers the Intel syntax.

You can find some examples of the syntax differences in this article: http://www.ibm.com/developerworks/linux/library/l-gas-nasm/index.html.



回答6:

There's none because there are many different CPUs with different instructions and other peculiarities and it's entirely up to their designer what syntax to use and how to name things. And there's little need to standardize that because assembly code is inherently unportable and needs to be rewritten for a different CPU anyway.

Assembly language is not OS-specific per se, it's CPU-specific, but for an assembly routine to access things that appear standard to you (e.g. some subroutine to print text in the console) OS-specific code is needed. For MSDOS you'd use BIOS and DOS interrupt service routines (invokable on the x86 CPU through int 13h, int 10h, int 21h, int 33h, etc instructions), for Windows you'd use Windows' (available through int 2eh and sysenter/syscall instructions), for Linux you'd use Linux' (e.g. int 80h). All of them are implemented differently in different OSes and expect different number and kinds of parameters and in different places (registers or memory). You can't standardize this part. The only thing you can do about it is build a compatibility/abstraction layer on top of the OS functionality so it looks the same from your assembly routines' point of view.



回答7:

Assembly syntax / language depends on CPU rather then OS. For the x86 CPU family there are however two syntax's AT&T (used by Unix like operating systems by default) and Intel (used by Windows and DOS etc.)

However the two assembly examples on the wiki are both doing different things. The windows example uses the WIN32 API and to show a message box, so all function arguments are pushed onto the stack in reversed order and then calls the function MessageBox() which on his turn creates the messagebox.

The linux example uses the write syscall to write a string to stdout. Here all 'arguments' are stored in the registers and then the int 0x80 creates an 'interrupt' now the OS is entering kernel land and the kernel prints the string to stdout.

The linux assemly could be rewritten like:

section .data
msg:   db     "Hello, world!", 10
.len: equ    $ - msg

section .text

extern write
extern exit

global _start
_start:
        push msg.len
        push msg
        push dword 1
        call write

        push dword 0
        call exit

The above assembly must be linked against libc and then this will call write in libc which on his turn executes exactly the same code as the example on the wiki.

Another thing to note, is that Windows and Unix like operating system use different file formats in there libraries and applications.

Unix like systems use ELF http://en.wikipedia.org/wiki/Executable_and_Linkable_Format and windows uses PE http://en.wikipedia.org/wiki/Portable_Executable

This is why you see different sections in the assemblies on the wiki page.