My questions are as follows:
- How does the Portable Executable format (on windows/unix) relate to the x86/x64 instruction set in general?
- Does the PE format store the exact set of opcodes supported by the processor, or is it a more generic format that the OS converts to match the CPU?
- How does the EXE file indicate the instruction set extensions needed (like 3DNOW! or SSE/MMX?)
- Are the opcodes common across all platforms like Windows, Mac and unix?
- Intel i386 compatible CPU chips like ones from Intel and AMD use a common instruction set. But I'm sure ARM-powered CPUs use different opcodes. Are these very very different or are the concepts similar? registers, int/float/double, SIMD, etc?
On newer platforms like .NET, Java or Flash, the instruction sets are stack-based opcodes that a JIT converts to the native format at runtime. Being accustomed to such a format I'd like to know how the "old" native EXE format is executed and formatted. For example, "registers" are usually unavailable in newer platform opcodes, since the JIT converts stack commands to the 16/32 available CPU registers as it deems necessary. But in native formats you need to refer to registers by index, and work out which registers can be reused and how often.
Are ARM opcodes very different from x86 opcodes?
Yes, they are. You should assume that all instruction sets for different processor families are completely different and incompatible. An instruction set first defines an encoding, which specifies one or more of these:
- the instruction opcode;
- the addressing mode;
- the operand size;
- the address size;
- the operands themselves.
The encoding further depends on how many registers it can address, whether it has to be backwards compatible, if it has to be decodable quickly, and how complex the instruction can be.
On the complexity: the ARM instruction set requires all operands to be loaded from memory to register and stored from register to memory using specialized load/store instructions, whereas x86 instructions can encode a single memory address as one of their operands and therefore do not have separate load/store instructions.
Then the instruction set itself: different processors will have specialized instructions to deal with specific situations. Even if two processors families have the same instruction for the same thing (e.g. an add
instruction), they are encoded very differently and may have slightly different semantics.
As you see, since any CPU designer can decide on all these factors, this makes the instruction set architectures for different processor families completely different and incompatible.
Are registers, int/float/double and SIMD very different concepts on different architectures?
No they are very similar. Every modern architecture has registers and can handle integers, and most can handle IEEE 754 compatible floating-point instructions of some size. For example, the x86 architecture has 80-bit floating-point values that are truncated to fit the 32-bit or 64-bit floating-point values you know. The idea behind SIMD instructions is also the same on all architectures that support it, but many do not support it and most have different requirements or restrictions for them.
Are the opcodes common across all platforms like Windows, Mac and Unix?
Given three Intel x86 systems, one running Windows, one running Mac OS X and one running Unix/Linux, then yes the opcodes are exactly the same since they run on the same processor. However, each operating system is different. Many aspects such as memory allocation, graphics, device driver interfacing and threading require operating system specific code. So you generally can't run an executable compiled for Windows on Linux.
Does the PE format store the exact set of opcodes supported by the processor, or is it a more generic format that the OS converts to match the CPU?
No, the PE format does not store the set of opcodes. As explained earlier, the instruction set architectures of different processor families are simply too different to make this possible. A PE file usually stores machine code for one specific processor family and operating system family, and will only run on such processors and operating systems.
There is however one exception: .NET assemblies are also PE files but they contain generic instructions that are not specific to any processor or operating system. Such PE files can be 'run' on other systems, but not directly. For example, mono on Linux can run such .NET assemblies.
How does the EXE file indicate the instruction set extensions needed (like 3DNOW! or SSE/MMX?)
While the executable can indicate the instruction set for which it was built (see Chris Dodd's answer), I don't believe the executable can indicate the extensions that are required. However, the executable code, when run, can detect such extensions. For example, the x86 instruction set has a CPUID
instruction that returns all the extensions and features supported by that particular CPU. The executable would just test that and abort when the processor does not meet the requirements.
.NET versus native code
You seem to know a thing or two about .NET assemblies and their instruction set, called CIL (Common Intermediate Language). Each CIL instruction follows a specific encoding and uses the evaluation stack for its operands. The CIL instruction set is kept very general and high-level. When it is run (on Windows by mscoree.dll
, on Linux by mono
) and a method is called, the Just-In-Time (JIT) compiler takes the method's CIL instructions and compiles them to machine code. Depending on the operating system and processor family the compiler has to decide which machine instructions to use and how to encode them. The compiled result is stored somewhere in memory. The next time the method is called the code jumps directly to the compiled machine code and can execute just as efficiently as a native executable.
How are ARM instructions encoded?
I have never worked with ARM, but from a quick glance at the documentation I can tell you the following. An ARM instruction is always 32-bits in length. There are many exceptional encodings (e.g. for branching and coprocessor instructions), but the general format of an ARM instruction is like this:
31 28 27 26 25 21 20 16
+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+--
| Condition | 0 | 0 |R/I| Opcode | S | Operand 1 | ...
+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+--
12 0
--+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
... | Destination | Operand 2 |
--+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
The fields mean the following:
- Condition: A condition that, when true, causes the instruction to be executed. This looks at the Zero, Carry, Negative and Overflow flags. When set to 1110, the instruction is always executed.
- R/I: When 0, operand 2 is a register. When 1, operand 2 is a constant value.
- Opcode: The instruction's opcode.
- S: When 1, the Zero, Carry, Negative and Overflow flags are set according to the instruction's result.
- Operand1: The index of a register that is used as the first operand.
- Destination: The index of a register that is used as the destination operand.
- Operand 2: The second operand. When R/I is 0, the index of a register. When R/I is 1, an unsigned 8-bit constant value. In addition to either one of these, some bits in operand 2 indicate whether the value is shifted/rotated.
For more detailed information you should read the documentation for the specific ARM version you want to know about. I used this ARM7TDMI-S Data Sheet, Chapter 4 for this example.
Note that each ARM instruction, no matter how simple, takes 4 bytes to encode. Because of the possible overhead, the modern ARM processors allow you to use an alternative 16-bit instruction set called Thumb. It cannot express all the things the 32-bit instruction set can, but it is also half as big.
On the other hand, x86-64 instructions have a variable length encoding, and use all kinds of modifiers to adjust the behavior of individual instructions. If you want to compare the ARM instructions with how x86 and x86-64 instructions are encoded, you should read the x86-64 Instruction Encoding article that I wrote on OSDev.org.
Your original question is very broad. If you want to know more, you should do some research and create a new question with the specific thing you want to know.
The PE file format (and the ELF/COFF file formats on non-windows machines) defines a header that appears at the beginning of the file, and in this header, there is a 'Machine' code. In a PE file, the 'Machine' code is 2 bytes, and the spec defines a bunch of constants for various machines:
0x1d3 Matsushita AM33
0x8664 AMD x64
0x1c0 ARM little endian
0x1c4 ARMv7 (or higher) Thumb mode only
0xebc EFI byte code
0x14c Intel 386 or later processors and compatible processors
0x200 Intel Itanium processor family
0x9041 Mitsubishi M32R little endian
0x266 MIPS16
0x366 MIPS with FPU
0x466 MIPS16 with FPU
0x1f0 Power PC little endian
0x1f1 Power PC with floating point support
0x166 MIPS little endian
0x1a2 Hitachi SH3
0x1a3 Hitachi SH3 DSP
0x1a6 Hitachi SH4
0x1a8 Hitachi SH5
0x1c2 ARM or Thumb (“interworking”)
0x169 MIPS little endian WCE v2
Then, within the PE (or ELF) file there are one or more 'Code' sections that contain (binary) machine code. That code is loaded into memory and executed directly by the CPU. The OS or dynamic linker/loader (which does the actual loading) knows what machine it is running on, so it checks the 'Machine' code in the header to make sure it matches before attempting to load and execute the code. If it doesn't match, the executable will be rejected, as it can't be run.