How can I multiply two hex 128 bit numbers in asse

I have two 128 bit numbers in memory in hexadecimal, for example (little endian):

x:0x12 0x45 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00
y:0x36 0xa1 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00

I've to perform the unsigned multiplication between these two numbers so my new number will be:

z:0xcc 0xe3 0x7e 0x2b 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00

Now, I'm aware that I can move the half x and y number into rax and rbx registers and, for example, do the mul operation, and do the same with the other half. The problem is that by doing so I lose the carry-over and I've no idea how I can avoid that. It's about 4 hours I'm facing this problem and the only solution that can I see is the conversion in binary (and <-> shl,1).

Can you give me some input about this problem?
I think the best solution is to take one byte par time.

标签： algorithm assembly byte x86-64 multiplication

2条回答

smile是对你的礼貌

2楼-- · 2019-01-28 02:39

Let μ = 2⁶⁴, then we can decompose your 128 bit numbers a and b into a = a₁μ + a₂ and b = b₁μ + b₂. Then we can compute c = ab with 64 · 64 → 128 bit multiplications by first computing partial products:

q₁μ + q₂ = a₂b₂
r₁μ + r₂ = a₁b₂
s₁μ + s₂ = a₂b₁
t₁μ + t₂ = a₁b₁

and then accumulating them into a 256 bit result (watch the overflow when doing the additions!):

c = t₁μ³ + (t₂ + s₁ + r₁) μ² + (s₂ + r₂ + q₁) μ + q₂

0人赞添加讨论(0) 举报

Ridiculous、

3楼-- · 2019-01-28 02:49

As usual, ask a compiler how to do something efficiently: GNU C on 64-bit platforms supports __int128_t and __uint128_t.

__uint128_t mul128(__uint128_t a, __uint128_t b) { return a*b; }

compiles to (gcc6.2 -O3 on Godbolt)

    imul    rsi, rdx        # tmp94, b
    mov     rax, rdi  # tmp93, a
    imul    rcx, rdi        # tmp95, a
    mul     rdx       # b
    add     rcx, rsi  # tmp96, tmp94
    add     rdx, rcx  #, tmp96
    ret

Since this is targeting the x86-64 System V calling convention, a is in RSI:RDI, while b is in RCX:RDX. The result is returned in RDX:RAX.

Pretty nifty that it only takes one MOV instruction, since gcc doesn't need the high-half result of a_upper * b_lower or vice versa. It can destroy the high halves of the inputs with the faster 2-operand form of IMUL since they're only used once.

With -march=haswell to enable BMI2, gcc uses MULX to avoid even the one MOV.

Sometimes compiler output isn't perfect, but very often the general strategy is a good starting point for optimizing by hand.

Of course, if what you really wanted in the first place was 128-bit multiplies in C, just use the compiler's built-in support for it. That lets the optimizer do its job, often giving better results than if you'd written a couple parts in inline-asm. (https://gcc.gnu.org/wiki/DontUseInlineAsm).

0人赞添加讨论(0) 举报

How can I multiply two hex 128 bit numbers in asse

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间