In GNU C inline asm, what're the modifiers for

2019-01-12 03:16发布

While trying to answer Embedded broadcasts with intrinsics and assembly, I was trying to do something like this:

__m512 mul_broad(__m512 a, float b) {
    int scratch = 0;
    asm(
        "vbroadcastss  %k[scalar], %q[scalar]\n\t"  // want  vbr..  %xmm0, %zmm0
        "vmulps        %q[scalar], %[vec], %[vec]\n\t"

        // how it's done for integer registers
        "movw         symbol(%q[inttmp]), %w[inttmp]\n\t"  // movw symbol(%rax), %ax
        "movsbl        %h[inttmp], %k[inttmp]\n\t"  // movsx %ah, %eax
        : [vec] "+x" (a), [scalar] "+x" (b),  [inttmp] "=r" (scratch)
        :
        :
    );
    return a;
}

The GNU C x86 Operand Modifiers doc only specifies modifiers up to q (DI (DoubleInt) size, 64bits). Using q on a vector register will always bring it down to xmm (from ymm or zmm).

The question:

What are the modifiers to change between sizes of vector register?

Also, are there any specific-size constraints for use with input or output operands? Something other than the generic x which can end up being xmm, ymm, or zmm depending on the type of the expression you put in the parentheses.

Off-topic:
clang appears to have some Yi / Yt constraints (not modifiers), but I can't find docs on that either. clang won't even compile this, even with the vector instructions commented out, because it doesn't like +x as a constraint for an __m512 vector.


Background / motivation

I can get the result I want by passing in the scalar as an input operand, constrained to be in the same register as a wider output operand, but it's clumsier. (The biggest downside for this use-case is that AFAIK it has to use an operand-number, rather than the [symbolic_name], so it's susceptible to breakage when adding/removing output constraints.)

// does what I want, by using a paired output and input constraint
__m512 mul_broad(__m512 a, float b) {
    __m512 tmpvec;
    asm(
        "vbroadcastss  %[scalar], %[tmpvec]\n\t"
        "vmulps        %[tmpvec], %[vec], %[vec]\n\t"
        : [vec] "+x" (a), [tmpvec] "=x" (tmpvec)
        : [scalar] "1" (b)
        :
    );

  return a;
}

godbolt link


Also, I think this whole approach to the problem I was trying to solve is going to be a dead end because Multi-Alternative constraints don't let you give different asm for the different constraint patterns. I was hoping to have x and r constraints end up emitting a vbroadcastss from a register, while m constraints end up emitting vmulps (mem_src){1to16}, %zmm_src2, %zmm_dst (a folded broadcast-load). The purpose of doing this with inline asm is that gcc doesn't yet know how to fold set1() memory operands into broadcast-loads (but clang does).

Anyway, this specific question is about operand modifiers and constraints for vector registers. Please focus on that, but comments and asides in answers are welcome on the other issue. (Or better, just comment / answer on Z Boson's question about embedded broadcasts.)

2条回答
爱情/是我丢掉的垃圾
2楼-- · 2019-01-12 03:39

It seems like all recent versions of GCC will accept both 'q' and 'x' as modifiers to print the XMM version of a YMM register.

Intel's icc looks to accept 'q', but not 'x' (at least through version 13.0.1).

[Edit: Well, it worked in this small example below, but in a real test case, I'm having problems with icc 14.0.3 accepting the 'q' but writing a 'ymm'.]

[Edit: Testing with more recent versions of icc, I'm finding that neither icc 15 nor icc 16 work with either 'q' or 'x'.]

But Clang 3.6 and earlier accept neither syntax. And at least on Godbolt, Clang 3.7 crashes with both!

// inline assembly modifiers to convert ymm to xmm

#include <x86intrin.h>
#include <stdint.h>

// gcc also accepts "%q1" as "%x1" 
// icc accepts "%q1" but not "%x1"
// clang-3.6 accepts neither
// clang-3.7 crashes with both!

#define ASM_MOVD(vec, reg)       \
__asm volatile("vmovd %q1, %0" : \
               "=r" (reg) :      \
               "x" (vec)         \
    );          

uint32_t movd_ymm(__m256i ymm) {
   uint32_t low;
   ASM_MOVD(ymm, low);
   return low;
}

uint32_t movd_xmm(__m128i xmm) {
   uint32_t low;
   ASM_MOVD(xmm, low);
   return low;
}

Link to test on Godbolt: http://goo.gl/bOkjNu

(Sorry that this isn't full answer to your question, but it seemed like useful information to share and was too long for a comment)

查看更多
一夜七次
3楼-- · 2019-01-12 03:44

From the file gcc/config/i386/i386.c of the GCC sources:

       b -- print the QImode name of the register for the indicated operand.
        %b0 would print %al if operands[0] is reg 0.
       w --  likewise, print the HImode name of the register.
       k --  likewise, print the SImode name of the register.
       q --  likewise, print the DImode name of the register.
       x --  likewise, print the V4SFmode name of the register.
       t --  likewise, print the V8SFmode name of the register.
       g --  likewise, print the V16SFmode name of the register.
       h -- print the QImode name for a "high" register, either ah, bh, ch or dh.

Similarly from gcc/config/i386/contraints.md:

    ;; We use the Y prefix to denote any number of conditional register sets:
    ;;  z   First SSE register.
    ;;  i   SSE2 inter-unit moves to SSE register enabled
    ;;  j   SSE2 inter-unit moves from SSE register enabled
    ;;  m   MMX inter-unit moves to MMX register enabled
    ;;  n   MMX inter-unit moves from MMX register enabled
    ;;  a   Integer register when zero extensions with AND are disabled
    ;;  p   Integer register when TARGET_PARTIAL_REG_STALL is disabled
    ;;  f   x87 register when 80387 floating point arithmetic is enabled
    ;;  r   SSE regs not requiring REX prefix when prefixes avoidance is enabled
    ;;  and all SSE regs otherwise

This file also defines a "Yk" constraint but I don't know if how well it would work in an asm statement:

    (define_register_constraint "Yk" "TARGET_AVX512F ? MASK_EVEX_REGS : NO_REGS"
    "@internal Any mask register that can be used as predicate, i.e. k1-k7.")

Note this is all copied from the latest SVN revision. I don't know what release of GCC, if any, the particular modifiers and constraints you're interested in were added.

查看更多
登录 后发表回答