What is the best/fastest way to load a 64-bit integer value in an xmm
SSE2 register in 32-bit mode?
In 64-bit mode, cvtsi2sd
can be used, but in 32-bit mode, it supports only 32-bit integers.
So far I haven't found much beyond:
- use
fild
, fstp
to stack then movsd
to xmm
register
- load the high 32-bit portion, multiply by 2^32, add the low 32-bit
First solution is slow, second solution might introduce precision loss (edit: and it is slow anyway, since the low 32 bit have to be converted as unsigned...)
Any better approach?
Your second option can be made to work, though it's a little unwieldy. I'll assume that your 64-bit number is initially in edx:eax.
cvtsi2sd xmm0, edx // high part * 2**-32
mulsd xmm0, [2**32 from mem] // high part
movsd xmm2, [2**52 from mem]
movd xmm1, eax
orpd xmm1, xmm2 // (double)(2*52 + low part as unsigned)
subsd xmm1, xmm2 // (double)(low part as unsigned)
addsd xmm0, xmm1 // (double)(high part + low part as unsigned)
All of the operations except for possibly the final one are exact, so this is correctly rounded. It should be noted that this conversion produces -0.0
when the input is 0
and the mxcsr
is set to round-to-minus-infinity. This would need to be addressed if it were being used in a runtime library for a compiler aiming to provide IEEE-754 conformance, but is not an issue for most usage.