How is code stored in the EXE format?

My questions are as follows:

How does the Portable Executable format (on windows/unix) relate to the x86/x64 instruction set in general?
Does the PE format store the exact set of opcodes supported by the processor, or is it a more generic format that the OS converts to match the CPU?
How does the EXE file indicate the instruction set extensions needed (like 3DNOW! or SSE/MMX?)
Are the opcodes common across all platforms like Windows, Mac and unix?
Intel i386 compatible CPU chips like ones from Intel and AMD use a common instruction set. But I'm sure ARM-powered CPUs use different opcodes. Are these very very different or are the concepts similar? registers, int/float/double, SIMD, etc?

On newer platforms like .NET, Java or Flash, the instruction sets are stack-based opcodes that a JIT converts to the native format at runtime. Being accustomed to such a format I'd like to know how the "old" native EXE format is executed and formatted. For example, "registers" are usually unavailable in newer platform opcodes, since the JIT converts stack commands to the 16/32 available CPU registers as it deems necessary. But in native formats you need to refer to registers by index, and work out which registers can be reused and how often.

回答1:

Are ARM opcodes very different from x86 opcodes?

Yes, they are. You should assume that all instruction sets for different processor families are completely different and incompatible. An instruction set first defines an encoding, which specifies one or more of these:

the instruction opcode;
the addressing mode;
the operand size;
the address size;
the operands themselves.

The encoding further depends on how many registers it can address, whether it has to be backwards compatible, if it has to be decodable quickly, and how complex the instruction can be.

On the complexity: the ARM instruction set requires all operands to be loaded from memory to register and stored from register to memory using specialized load/store instructions, whereas x86 instructions can encode a single memory address as one of their operands and therefore do not have separate load/store instructions.

Then the instruction set itself: different processors will have specialized instructions to deal with specific situations. Even if two processors families have the same instruction for the same thing (e.g. an add instruction), they are encoded very differently and may have slightly different semantics.

As you see, since any CPU designer can decide on all these factors, this makes the instruction set architectures for different processor families completely different and incompatible.

Are registers, int/float/double and SIMD very different concepts on different architectures?

No they are very similar. Every modern architecture has registers and can handle integers, and most can handle IEEE 754 compatible floating-point instructions of some size. For example, the x86 architecture has 80-bit floating-point values that are truncated to fit the 32-bit or 64-bit floating-point values you know. The idea behind SIMD instructions is also the same on all architectures that support it, but many do not support it and most have different requirements or restrictions for them.

Are the opcodes common across all platforms like Windows, Mac and Unix?

Given three Intel x86 systems, one running Windows, one running Mac OS X and one running Unix/Linux, then yes the opcodes are exactly the same since they run on the same processor. However, each operating system is different. Many aspects such as memory allocation, graphics, device driver interfacing and threading require operating system specific code. So you generally can't run an executable compiled for Windows on Linux.

Does the PE format store the exact set of opcodes supported by the processor, or is it a more generic format that the OS converts to match the CPU?

No, the PE format does not store the set of opcodes. As explained earlier, the instruction set architectures of different processor families are simply too different to make this possible. A PE file usually stores machine code for one specific processor family and operating system family, and will only run on such processors and operating systems.

There is however one exception: .NET assemblies are also PE files but they contain generic instructions that are not specific to any processor or operating system. Such PE files can be 'run' on other systems, but not directly. For example, mono on Linux can run such .NET assemblies.

How does the EXE file indicate the instruction set extensions needed (like 3DNOW! or SSE/MMX?)

While the executable can indicate the instruction set for which it was built (see Chris Dodd's answer), I don't believe the executable can indicate the extensions that are required. However, the executable code, when run, can detect such extensions. For example, the x86 instruction set has a CPUID instruction that returns all the extensions and features supported by that particular CPU. The executable would just test that and abort when the processor does not meet the requirements.

.NET versus native code

You seem to know a thing or two about .NET assemblies and their instruction set, called CIL (Common Intermediate Language). Each CIL instruction follows a specific encoding and uses the evaluation stack for its operands. The CIL instruction set is kept very general and high-level. When it is run (on Windows by mscoree.dll, on Linux by mono) and a method is called, the Just-In-Time (JIT) compiler takes the method's CIL instructions and compiles them to machine code. Depending on the operating system and processor family the compiler has to decide which machine instructions to use and how to encode them. The compiled result is stored somewhere in memory. The next time the method is called the code jumps directly to the compiled machine code and can execute just as efficiently as a native executable.

How are ARM instructions encoded?

I have never worked with ARM, but from a quick glance at the documentation I can tell you the following. An ARM instruction is always 32-bits in length. There are many exceptional encodings (e.g. for branching and coprocessor instructions), but the general format of an ARM instruction is like this:

31             28  27  26  25              21  20              16
+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+--
|   Condition   | 0 | 0 |R/I|    Opcode     | S |   Operand 1   | ...
+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+--

                   12                                               0
  --+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
... |  Destination  |               Operand 2                       |
  --+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+

The fields mean the following:

Condition: A condition that, when true, causes the instruction to be executed. This looks at the Zero, Carry, Negative and Overflow flags. When set to 1110, the instruction is always executed.
R/I: When 0, operand 2 is a register. When 1, operand 2 is a constant value.
Opcode: The instruction's opcode.
S: When 1, the Zero, Carry, Negative and Overflow flags are set according to the instruction's result.
Operand1: The index of a register that is used as the first operand.
Destination: The index of a register that is used as the destination operand.
Operand 2: The second operand. When R/I is 0, the index of a register. When R/I is 1, an unsigned 8-bit constant value. In addition to either one of these, some bits in operand 2 indicate whether the value is shifted/rotated.

For more detailed information you should read the documentation for the specific ARM version you want to know about. I used this ARM7TDMI-S Data Sheet, Chapter 4 for this example.

Note that each ARM instruction, no matter how simple, takes 4 bytes to encode. Because of the possible overhead, the modern ARM processors allow you to use an alternative 16-bit instruction set called Thumb. It cannot express all the things the 32-bit instruction set can, but it is also half as big.

On the other hand, x86-64 instructions have a variable length encoding, and use all kinds of modifiers to adjust the behavior of individual instructions. If you want to compare the ARM instructions with how x86 and x86-64 instructions are encoded, you should read the x86-64 Instruction Encoding article that I wrote on OSDev.org.

Your original question is very broad. If you want to know more, you should do some research and create a new question with the specific thing you want to know.

回答2:

The PE file format (and the ELF/COFF file formats on non-windows machines) defines a header that appears at the beginning of the file, and in this header, there is a 'Machine' code. In a PE file, the 'Machine' code is 2 bytes, and the spec defines a bunch of constants for various machines:

0x1d3   Matsushita AM33
0x8664  AMD x64
0x1c0   ARM little endian   
0x1c4   ARMv7 (or higher) Thumb mode only
0xebc   EFI byte code   
0x14c   Intel 386 or later processors and compatible processors 
0x200   Intel Itanium processor family  
0x9041  Mitsubishi M32R little endian   
0x266   MIPS16  
0x366   MIPS with FPU
0x466   MIPS16 with FPU 
0x1f0   Power PC little endian  
0x1f1   Power PC with floating point support    
0x166   MIPS little endian  
0x1a2   Hitachi SH3 
0x1a3   Hitachi SH3 DSP 
0x1a6   Hitachi SH4 
0x1a8   Hitachi SH5     
0x1c2   ARM or Thumb (“interworking”)   
0x169   MIPS little endian WCE v2

Then, within the PE (or ELF) file there are one or more 'Code' sections that contain (binary) machine code. That code is loaded into memory and executed directly by the CPU. The OS or dynamic linker/loader (which does the actual loading) knows what machine it is running on, so it checks the 'Machine' code in the header to make sure it matches before attempting to load and execute the code. If it doesn't match, the executable will be rejected, as it can't be run.