ube error: _mm_aeskeygenassist_si128 intrinsic req

2019-07-21 12:45发布

问题:

I'm working under Sun Studio 12.3 on SunOS 5.11 (Solaris 11.3). Its providing a compile error that I don't quite understand:

$ /opt/solarisstudio12.3/bin/CC -xarch=sse2 -xarch=aes -xarch=sse4_2 -c test.cxx 
"test.cxx", line 11: ube: error: _mm_aeskeygenassist_si128 intrinsic requires at least -xarch=aes.
CC: ube failed for test.cxx

Adding -m64 produces the same error.

There's not much to the test program. It simply exercises a SSE2 intrinsic, and an AES intrinsic:

$ cat test.cxx
#include <stdint.h>
#include <wmmintrin.h>
#include <emmintrin.h>
int main(int argc, char* argv[])
{
  // SSE2
  int64_t x[2];
  __m128i y = _mm_loadu_si128((__m128i*)x);

  // AES
  __m128i z = _mm_aeskeygenassist_si128(y,0);

  return 0;
}

I've been trying to work through the manual and learn how to specify multiple cpu architecture features, like SSE2, SSSE3, AES and SSE4. But I can't seem to determine how to specify multiple ones. Here's one of the more complete pages I have found: Oracle Man Page CC.1, but I'm obviously missing something with respect to -xarch.

What am I doing wrong, and how do I fix it?

回答1:

This command line

$ /opt/solarisstudio12.3/bin/CC -xarch=sse2 -xarch=aes -xarch=sse4_2 -c test.cxx 

will use the last of -xarch=sse2 -xarch=aes -xarch=sse4_2 and cause the compiler to emit sse4_2-compatible binaries.

This is documented in Chapter 3 of the C++ User's Guide:

3.2 General Guidelines

Some general guidelines for the C++ compiler options are:

  • The-llib option links with library liblib.a (or liblib.so). It is always safer to put-llib after the source and object files to ensure the order in which libraries are searched.

  • In general, processing of the compiler options is from left to right (with the exception that-U options are processed after all-D options), allowing selective overriding of macro options (options that include other options). This rule does not apply to linker options.

  • The -features, -I -l, -L, -library, -pti, -R, -staticlib, -U, -verbose, and -xprefetch options accumulate, they do not override.

  • The -D option accumulates. However, multiple -D options for the same name override each other.

Source files, object files, and libraries are compiled and linked in the order in which they appear on the command line.

This is done so you can do things like override the expansion of arguments like -fast, which expands to about 10 separate arguments.

You should use the -xarch=aes flag - either last or as the only -xarch=... option.



回答2:

I'm going to toss in an answer for those coming from GCC. In the GCC world, we do -march=native and GCC defines macros like -D__SSE2__, -D__SSE4_1__, -D__SSE4_2__, -D__AES__, -D__AVX__, -D__BMI__, etc.

SunCC does not do like GCC does. It does not provide defines like __SSE2__; nor does it provide the value for -xarch.

Here are the references to the relevant Sun Studio manuals and the -xarch options/instructions set choices:

  • Sun Studio 12 Update 1 User's Guide (PDF; could not find online)
  • Solaris Studio 12.2 User's Guide
  • Solaris Studio 12.3 User's Guide
  • Solaris Studio 12.4 User's Guide
  • Developer Studio 12.5 User's Guide

Here's how we are determining what flags we can use, and then converting them to GCC preprocessor macros. Its awful, but I don't know how to get the code generated otherwise.

CC=...
EGREP=...

X86_CPU_FLAGS=$(isainfo -v 2>/dev/null)
SUNCC_510_OR_ABOVE=$("$CXX" -V 2>&1 | "$EGREP" -c "CC: (Sun|Studio) .* (5\.1[0-9]|5\.[2-9]|[6-9]\.)")
SUNCC_511_OR_ABOVE=$("$CXX" -V 2>&1 | "$EGREP" -c "CC: (Sun|Studio) .* (5\.1[1-9]|5\.[2-9]|[6-9]\.)")
SUNCC_512_OR_ABOVE=$("$CXX" -V 2>&1 | "$EGREP" -c "CC: (Sun|Studio) .* (5\.1[2-9]|5\.[2-9]|[6-9]\.)")
SUNCC_513_OR_ABOVE=$("$CXX" -V 2>&1 | "$EGREP" -c "CC: (Sun|Studio) .* (5\.1[3-9]|5\.[2-9]|[6-9]\.)")

SUNCC_XARCH=
if [[ ("$SUNCC_511_OR_ABOVE" -ne "0") ]]; then
    if [[ ($(echo -n "$X86_CPU_FLAGS" | "$GREP" -c "sse2") -ne "0") ]]; then PLATFORM_CXXFLAGS+=("-D__SSE2__"); SUNCC_XARCH=sse2; fi
        if [[ ($(echo -n "$X86_CPU_FLAGS" | "$GREP" -c "sse3") -ne "0") ]]; then PLATFORM_CXXFLAGS+=("-D__SSE3__"); SUNCC_XARCH=ssse3; fi
        if [[ ($(echo -n "$X86_CPU_FLAGS" | "$GREP" -c "ssse3") -ne "0") ]]; then PLATFORM_CXXFLAGS+=("-D__SSSE3__"); SUNCC_XARCH=ssse3; fi
        if [[ ("$SUNCC_512_OR_ABOVE" -ne "0") ]]; then
            if [[ ($(echo -n "$X86_CPU_FLAGS" | "$GREP" -c "sse4.1") -ne "0") ]]; then PLATFORM_CXXFLAGS+=("-D__SSE4_1__"); SUNCC_XARCH=ssse4_1; fi
            if [[ ($(echo -n "$X86_CPU_FLAGS" | "$GREP" -c "sse4.2") -ne "0") ]]; then PLATFORM_CXXFLAGS+=("-D__SSE4_2__"); SUNCC_XARCH=ssse4_2; fi
            if [[ ("$SUNCC_513_OR_ABOVE" -ne "0") ]]; then
                if [[ ($(echo -n "$X86_CPU_FLAGS" | "$GREP" -c "aes") -ne "0") ]]; then PLATFORM_CXXFLAGS+=("-D__AES__"); SUNCC_XARCH=aes; fi
                if [[ ($(echo -n "$X86_CPU_FLAGS" | "$GREP" -c "pclmulqdq") -ne "0") ]]; then PLATFORM_CXXFLAGS+=("-D__PCLMUL__"); SUNCC_XARCH=aes; fi
                if [[ ($(echo -n "$X86_CPU_FLAGS" | "$GREP" -c "rdrand") -ne "0") ]]; then PLATFORM_CXXFLAGS+=("-D__RDRND__"); SUNCC_XARCH=avx_i; fi
                if [[ ($(echo -n "$X86_CPU_FLAGS" | "$GREP" -c "rdseed") -ne "0") ]]; then PLATFORM_CXXFLAGS+=("-D__RDSEED__"); SUNCC_XARCH=avx_i; fi
                if [[ ($(echo -n "$X86_CPU_FLAGS" | "$GREP" -c "avx") -ne "0") ]]; then PLATFORM_CXXFLAGS+=("-D__AVX__"); SUNCC_XARCH=avx; fi
                if [[ ($(echo -n "$X86_CPU_FLAGS" | "$GREP" -c "avx2") -ne "0") ]]; then PLATFORM_CXXFLAGS+=("-D__AVX2__"); SUNCC_XARCH=avx2; fi
                if [[ ($(echo -n "$X86_CPU_FLAGS" | "$GREP" -c "bmi") -ne "0") ]]; then PLATFORM_CXXFLAGS+=("-D__BMI__"); SUNCC_XARCH=avx2; fi
                if [[ ($(echo -n "$X86_CPU_FLAGS" | "$GREP" -c "bmi2") -ne "0") ]]; then PLATFORM_CXXFLAGS+=("-D__BMI2__"); SUNCC_XARCH=avx2; fi
                if [[ ($(echo -n "$X86_CPU_FLAGS" | "$GREP" -c "adx") -ne "0") ]]; then PLATFORM_CXXFLAGS+=("-D__ADX__"); SUNCC_XARCH=avx2_i; fi        
            fi
        fi
    fi
fi
PLATFORM_CXXFLAGS+=("-xarch=$SUNCC_XARCH")

The gyrations above allow us to do things like this (except we need SSE2 though ADX).

#if (_MSC_VER >= 1700) || defined(__RDRND__)
    uint64_t val;
    if(_rdrand64_step(&val))
    {
        // Use RDRAND value
    }
#endif

Without the gyrations, we continually crash the 12.1 through 12.3 compilers during testing with the inline assembly and intrinsics.

The result of running the script gives us the recipe for CFLAGS and CXXFLAGS. Below is from a 4th gen Core i5. XEON's produce different results, as does a 5th gen Core i5. For example, a 5th gen Core i5 will have ADX and use -xarch=avx_i.

Pathname: /opt/solstudio12.2/bin/CC (symlinked)
CXXFLAGS: -D__SSE2__ -D__SSE3__ -D__SSSE3__ -xarch=ssse3

/opt/solarisstudio12.3/bin/CC (symlinked)
CXXFLAGS: -D__SSE2__ -D__SSE3__ -D__SSSE3__ -D__SSE4_1__ -D__SSE4_2__ -xarch=ssse4_2

Pathname: /opt/solarisstudio12.4/bin/CC
CXXFLAGS: -D__SSE2__ -D__SSE3__ -D__SSSE3__ -D__SSE4_1__ -D__SSE4_2__ -D__AES__ -D__PCLMUL__ -D__RDRND__ -D__AVX__ -xarch=avx

...