可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

i cant figure out a way to move code from one location to other in memory

so i put in a way some thing like this but it doesn't work

extern _transfer_code_segment

 extern _kernel_segment

  extern _kernel_reloc


 extern _kernel_reloc_segment

  extern _kernel_para_size


    section .text16



    global transfer_to_kernel




transfer_to_kernel:



    ;cld

    ;
    ; Turn off interrupts -- the stack gets destroyed during this routine.
    ; kernel must set up its own stack.
    ;
    ;cli
    ; stack for only for this function

    push ebp
    mov ebp, esp








    mov eax, _kernel_segment             ; source segment
    mov ebx, _kernel_reloc_segment       ; dest segment
    mov ecx, _kernel_para_size

.loop:



    ; XXX: Will changing the segment registers this many times have
    ; acceptable performance?


    mov ds, eax  ;this the place where the error
    mov es, ebx  ; this to
    xor esi, esi
    xor edi, edi
    movsd
    movsd
    movsd
    movsd
    inc eax
    inc ebx
    dec ecx
    jnz .loop



    leave
    ret

do have any other way to do it or how can i solve this problem

回答1:

That will have horrible performance. Agner Fog says mov sr, r has one per 13 cycle throughput on Nehalem, and I'd guess that if anything it's worse on more recent CPUs since segmentation is obsolete. Agner stopped testing mov to/from segment register performance after Nehalem.

Are you doing this to let you copy more than 64kiB total? If so, at least copy a full 64kiB before changing a segment register.

I think you can use 32-bit addressing modes to avoid messing with segments, but segments that you set in 16-bit mode implicitly have a "limit" of 64k. (i.e. mov eax, [esi] is encodable in 16-bit mode, with an operand-size and address-size prefix. But with a value in esi of more than 0xFFFF, I think it would fault for violating the ds segment limit.) The the osdev link below for more.

As Cody says, use rep movsd to let the CPU use an optimized microcoded memcpy. (or rep movsb, but only on CPUs with the ERMSB feature. In practice, most CPUs that support ERMSB give the same performance benefit for rep movsd too, so it's probably easiest to just always use rep movsd. But IvyBridge might not.) It's much faster than separate movsd instructions (which are slower than separate mov loads/stores). A loop with SSE 16B vector loads/stores might go almost as fast as rep movsd on some CPUs, but you can't use AVX for 32B vectors in 16-bit mode.

Another option for big copies: huge unreal mode

In 32-bit protected mode, the values you put in segments are descriptors, not the actual segment base itself. mov es, ax triggers the CPU to use the value as an index into the GDT or LDT and get the segment base / limit from there.

If you do this in 32-bit mode and then switch back to 16-bit mode, you're in huge unreal mode with segments that can be larger than 64k. The segment base/limit/permissions stay cached until something writes a segment register in 16-bit mode and puts it back to the usual 16*seg with a 64k limit. (If I'm describing this correctly). See http://wiki.osdev.org/Unreal_Mode for more.

Then you may be able to use rep movsd in 16-bit mode with operand-size and address-size prefixes so you can copy more than 64kiB in one go.

This works well for ds and es, but interrupts will set cs:ip, so this isn't convenient for big flat code address space, just data.

回答2:

The segment registers are all 16 bits in size. Compare that to the e?x registers, which are 32 bits in size. Obviously, these two things are not the same size, prompting your assembler to generate an "operand size mismatch" error—the sizes of the two operands do not match.

Presumably, you want to initialize the segment register with the lower 16 bits of the register, so you would do something like:

mov  ds, ax
mov  es, bx

Also, no, you don't actually need to initialize the segment registers on each iteration of the loop. What you're doing now is incrementing the segment and forcing the offset to 0, then copying 4 DWORDs. What you should be doing is leaving the segment alone and just incrementing the offset (which the MOVSD instruction does implicitly).

    mov eax, _kernel_segment             ; TODO: see why these segment values are not
    mov ebx, _kernel_reloc_segment       ;        already stored as 16 bit values
    mov ecx, _kernel_para_size

    mov ds, ax
    mov es, bx

    xor esi, esi
    xor edi, edi

.loop:

    movsd
    movsd
    movsd
    movsd

    dec  ecx
    jnz .loop

But note that adding the REP prefix to the MOVSD instruction would allow you to do this even more efficiently. This basically does MOVSD a total of ECX times. For example:

mov ds, ax
mov es, bx
xor esi, esi
xor edi, edi
shl ecx, 2         ; adjust size since we're doing 1 MOVSD for each ECX, rather than 4
rep movsd

Somewhat counter-intuitively, if your processor implements the ERMSB optimization (Intel Ivy Bridge and later), REP MOVSB may actually be faster than REP MOVSD, so you could do:

mov ds, ax
mov es, bx
xor esi, esi
xor edi, edi
shl ecx, 4
rep movsb

Finally, although you've commented out the CLD instruction in your code, you do need to have this in order to ensure that the moves happen according to plan. You cannot rely on the direction flag having a particular value; you need to initialize it yourself to the value that you want.

(Another alternative would be streaming SIMD instructions or even floating-point stores, neither of which would care about the direction flag. This has the advantage of increasing memory copy bandwidth because you'd be doing 64-bit, 128-bit, or larger copies at a time, but introduces other disadvantages. In a kernel, I'd stick with MOVSD/MOVSB unless you can prove isn't a significant bottleneck and/or you want to have optimized paths for different processors.)