I see code like:
mov ax, cs
mov ds, ax
mov es, ax
Why can't I just compress this to:
mov ds, cs
mov es, cs
Is the first way faster since its using the accumulator register? But that wouldn't seem intuitive since cs and ds are segment registers. Or is there some restriction that I'm unaware of?
I'm using nasm by the way.
It's not the assembly language really but the underlying machine language which prevents these operations.
While assembly is made up of easy to read words or mnemonics, they actually represent quite directly the 1s and 0s of the machine code. On x86 CPUs each instruction is typically made up of a sequence of bytes with individual bytes or even bits within the bytes having meaning. Certain bits represent the instruction, others represent the addressing mode. In register addressing modes such as your examples some bits represent which specific registers are to be used as the source and destination of the
mov
instruction.Now the x86 family of processors go back a long way to the 1970s when CPU architecture was simpler. In those days the concept of the accumulator was of key importance -
ax
is the 16-bit x86 accumulator. All calculations were built up or "accumulated" in this register so it was available to all instructions. Other general purpose registers had a more restricted range of use.Because instructions were based on bytes you wanted as few bytes to represent an instruction as possible to keep instruction decoding fast. To keep as many instructions as short as possible the use of the accumulator is made central.
On more modern CPUs such as the Motorola 680x0 more general purpose registers have more abilities that were previously the domain of the accumulator. On RISC CPUs all registers are as flexible as accumulators. I have heard that in 64-bit mode the current x86/amd64 instruction set is now much less restricted.
Look at the Intel Manual Volume 2 Instruction Set Reference - 325383-056US September 2015 "MOV Move" column "Instruction".
The only 16-bit mov to registers is encoded in:
And "3.1.1.3 Instruction Column in the Opcode Summary Table" explains:
Thus
mov ds, cs
is not encodable, as there is nomov Sreg, Sreg
version.You can't mov segment register to segment register -- there's no instruction for it.
There is only so much room in a processor for the microcode for all its instructions. So one general instruction is often preferred over several special purpose ones for rarely uused operations lake changing segment registers. Also, for some processors the number of instructions is absolutely limited by the architecture - for example, the original 8080 processor was limited to 256 instructions as they all had to have the op code encoded in a single byte.