Why isn't movl from memory to memory allowed?

2020-01-26 10:12发布

I was wondering if this is allowed in assembly,

 movl (%edx) (%eax) 

I would have guessed that it access the memory in the first operand and puts in the memory of the second operand, something like *a = *b but I haven't seen any example dealing with such so I'm guessing it's not allowable. Also, I've been told that this isn't allowed

 leal %esi (%edi)

why is that? Lastly, are there other similar functions I should be aware that aren't allowed.

标签: assembly x86
2条回答
男人必须洒脱
2楼-- · 2020-01-26 10:46

It is not valid. You may not perform memory to memory moves directly on any architecture that I am familiar with except with a limited set of operands. The exception are string move and the like through the SI and DI registers on Intel compatible processors, for instance, though these should be avoided (see below). Most architectures do have something that assists in these limited memory to memory moves.

This makes a great deal of sense if you think about the hardware. There are address lines and data lines. The processor signals which memory address to access on the address lines and the data is then read or written via the data lines. Because of this data must pass through the cache or the processor to get to other memory. In fact, if you have a look at this reference on page 145, you'll see the strong statement that MOVS and its friends must never be used:

Note that while the REP MOVS instruction writes a word to the destination, it reads the next word from the source in the same clock cycle. You can have a cache bank conflict if bit 2-4 are the same in these two addresses on P2 and P3. In other words, you will get a penalty of one clock extra per iteration if ESI+WORDSIZE-EDI is divisible by 32. The easiest way to avoid cache bank conflicts is to align both source and destination by 8. Never use MOVSB or MOVSW in optimized code, not even in 16-bit mode.

On many processors, REP MOVS and REP STOS can perform fast by moving 16 bytes or an entire cache line at a time. This happens only when certain conditions are met. Depending on the processor, the conditions for fast string instructions are, typically, that the count must be high, both source and destination must be aligned, the direction must be forward, the distance between source and destination must be at least the cache line size, and the memory type for both source and destination must be either write-back or write-combining (you can normally assume the latter condition is met).

Under these conditions, the speed is as high as you can obtain with vector register moves or even faster on some processors. While the string instructions can be quite convenient, it must be emphasized that other solutions are faster in many cases. If the above conditions for fast move are not met then there is a lot to gain by using other methods.

This also, in a sense, explains is why register to register moves are ok (though there are other reasons). Perhaps I should say, it explains why they wouldn't require very special hardware on the board... The registers are all in the processor; there's no need to access the bus to read and write via addresses.

查看更多
一夜七次
3楼-- · 2020-01-26 10:48
movl (mem), (mem)

mov dword [eax], [ecx]    ; or the equivalent in Intel-syntax

is invalid because x86 machine code doesn't have an encoding for mov with two addresses. It has mov r32, r/m32 and mov r/m32, r32. Reg-reg moves can be encoded using either the mov r32, r/m32 opcode or the mov r/m32, r32 opcode. Many other instructions have two opcodes, one where the dest has to be a register, and one where the src has to be a register.

(And there are some specialized forms, like mov r32, imm32, or movabs r64, [64bit-absolute-address].)

See the x86 instruction set reference manual (links in the x86 tag wiki https://stackoverflow.com/tags/x86/info). I used Intel/NASM syntax here because that's what the insn ref manual does.

Very few instructions can do a load and store to two different addresses, e.g. movs (string-move), and push/pop (mem) (What x86 instructions take two (or more) memory operands?). Many ALU instructions are available with a memory destination, which makes them do a read-modify-write on a single memory location.


There are no instructions that take two arbitrary effective-addresses (i.e. specified with a flexible addressing mode). movs has implicit source and dest operands, and push has an implicit dest (esp).

An x86 instruction has at most one ModRM byte, and a ModRM can only encode one reg/memory operand (2 bits for mode, 3 bits for base register), and another register-only operand (3 bits). With an escape code, ModRM can signal a SIB byte to encode base + scaled-index for the memory operand, but there's still only room to encode one memory operand.

As I mentioned above, the memory-source and memory-destination forms of the same instruction (asm source mnemonic) use two different opcodes. As far as the hardware is concerned, they are different instructions.


The reasons for this design choice are probably partly implementation complexity: If it's possible for a single instruction to need two results from an AGU (address-generation-unit), then the wiring has to be there to make that possible. Some of this complexity is in the decoders that figure out which instruction an opcode is, and parse the remaining bits and bytes to figure out what the operands are. Since no other instruction can have multiple r/m operands, it would cost extra transistors (silicon area) to support it.

It also potentially gives an instruction five input dependencies (two-register addressing mode for the store address, same for the load address, and the load date). When 8086 / 80386 was being designed, superscalar / out-of-order / dependency tracking probably wasn't on the radar. 386 added a lot of new instructions, so a mem-to-mem encoding of mov could have been done, but wasn't. If 386 had started to forward results directly from ALU output to ALU input and stuff like that (to reduce latency compared to always committing results to the register file), then this reason would have been one of the reasons it wasn't implemented.

If it existed, Intel P6 would probably decode it to two separate uops, a load and a store. It certainly wouldn't make sense to introduce now, or any time after 1995 when P6 was designed and simpler instructions gained more of a speed advantage over complex ones. (See http://agner.org/optimize/ for stuff about making code run fast.)

I can't see this being very useful, anyway. If you want this, you're probably not making enough use of registers. Figure out how to process your data on the fly while copying, if possible. Of course, sometimes you just have to do a load and then a store, e.g. in a sort routine to swap the rest of a struct after comparing based on one member. Doing moves in larger blocks, using xmm registers, is a good idea.


leal %esi, (%edi)

Two problems here:

First, registers don't have addresses. A bare %esi is not a valid effective-address.

Second, lea's destination must be a register. There's no encoding where it takes a second effective-address to store the destination to memory.


BTW, neither are valid because you left out the , between the two operands.

valid-asm.s:2: Error: number of operands mismatch for `lea'

The rest of the answer only discusses the code after fixing that syntax error.

查看更多
登录 后发表回答