I'm working under Sun Studio 12.3 on SunOS 5.11 (Solaris 11.3). Its providing a compile error that I don't quite understand:
$ /opt/solarisstudio12.3/bin/CC -xarch=sse2 -xarch=aes -xarch=sse4_2 -c test.cxx
"test.cxx", line 11: ube: error: _mm_aeskeygenassist_si128 intrinsic requires at least -xarch=aes.
CC: ube failed for test.cxx
Adding -m64
produces the same error.
There's not much to the test program. It simply exercises a SSE2 intrinsic, and an AES intrinsic:
$ cat test.cxx
#include <stdint.h>
#include <wmmintrin.h>
#include <emmintrin.h>
int main(int argc, char* argv[])
{
// SSE2
int64_t x[2];
__m128i y = _mm_loadu_si128((__m128i*)x);
// AES
__m128i z = _mm_aeskeygenassist_si128(y,0);
return 0;
}
I've been trying to work through the manual and learn how to specify multiple cpu architecture features, like SSE2, SSSE3, AES and SSE4. But I can't seem to determine how to specify multiple ones. Here's one of the more complete pages I have found: Oracle Man Page CC.1, but I'm obviously missing something with respect to -xarch
.
What am I doing wrong, and how do I fix it?
This command line
$ /opt/solarisstudio12.3/bin/CC -xarch=sse2 -xarch=aes -xarch=sse4_2 -c test.cxx
will use the last of -xarch=sse2 -xarch=aes -xarch=sse4_2
and cause the compiler to emit sse4_2
-compatible binaries.
This is documented in Chapter 3 of the C++ User's Guide:
3.2 General Guidelines
Some general guidelines for the C++ compiler options are:
The-llib option links with library liblib.a (or liblib.so). It is always safer to put-llib after the source and object files to ensure
the order in which libraries are searched.
In general, processing of the compiler options is from left to right (with the exception that-U options are processed after all-D
options), allowing selective overriding of macro options (options that
include other options). This rule does not apply to linker options.
The -features, -I -l, -L, -library, -pti, -R, -staticlib, -U, -verbose, and -xprefetch options accumulate, they do not override.
The -D option accumulates. However, multiple -D options for the same name override each other.
Source files, object files, and libraries are compiled and linked in
the order in which they appear on the command line.
This is done so you can do things like override the expansion of arguments like -fast
, which expands to about 10 separate arguments.
You should use the -xarch=aes
flag - either last or as the only -xarch=...
option.
I'm going to toss in an answer for those coming from GCC. In the GCC world, we do -march=native
and GCC defines macros like -D__SSE2__
, -D__SSE4_1__
, -D__SSE4_2__
, -D__AES__
, -D__AVX__
, -D__BMI__
, etc.
SunCC does not do like GCC does. It does not provide defines like __SSE2__
; nor does it provide the value for -xarch
.
Here are the references to the relevant Sun Studio manuals and the -xarch
options/instructions set choices:
- Sun Studio 12 Update 1 User's Guide (PDF; could not find online)
- Solaris Studio 12.2 User's Guide
- Solaris Studio 12.3 User's Guide
- Solaris Studio 12.4 User's Guide
- Developer Studio 12.5 User's Guide
Here's how we are determining what flags we can use, and then converting them to GCC preprocessor macros. Its awful, but I don't know how to get the code generated otherwise.
CC=...
EGREP=...
X86_CPU_FLAGS=$(isainfo -v 2>/dev/null)
SUNCC_510_OR_ABOVE=$("$CXX" -V 2>&1 | "$EGREP" -c "CC: (Sun|Studio) .* (5\.1[0-9]|5\.[2-9]|[6-9]\.)")
SUNCC_511_OR_ABOVE=$("$CXX" -V 2>&1 | "$EGREP" -c "CC: (Sun|Studio) .* (5\.1[1-9]|5\.[2-9]|[6-9]\.)")
SUNCC_512_OR_ABOVE=$("$CXX" -V 2>&1 | "$EGREP" -c "CC: (Sun|Studio) .* (5\.1[2-9]|5\.[2-9]|[6-9]\.)")
SUNCC_513_OR_ABOVE=$("$CXX" -V 2>&1 | "$EGREP" -c "CC: (Sun|Studio) .* (5\.1[3-9]|5\.[2-9]|[6-9]\.)")
SUNCC_XARCH=
if [[ ("$SUNCC_511_OR_ABOVE" -ne "0") ]]; then
if [[ ($(echo -n "$X86_CPU_FLAGS" | "$GREP" -c "sse2") -ne "0") ]]; then PLATFORM_CXXFLAGS+=("-D__SSE2__"); SUNCC_XARCH=sse2; fi
if [[ ($(echo -n "$X86_CPU_FLAGS" | "$GREP" -c "sse3") -ne "0") ]]; then PLATFORM_CXXFLAGS+=("-D__SSE3__"); SUNCC_XARCH=ssse3; fi
if [[ ($(echo -n "$X86_CPU_FLAGS" | "$GREP" -c "ssse3") -ne "0") ]]; then PLATFORM_CXXFLAGS+=("-D__SSSE3__"); SUNCC_XARCH=ssse3; fi
if [[ ("$SUNCC_512_OR_ABOVE" -ne "0") ]]; then
if [[ ($(echo -n "$X86_CPU_FLAGS" | "$GREP" -c "sse4.1") -ne "0") ]]; then PLATFORM_CXXFLAGS+=("-D__SSE4_1__"); SUNCC_XARCH=ssse4_1; fi
if [[ ($(echo -n "$X86_CPU_FLAGS" | "$GREP" -c "sse4.2") -ne "0") ]]; then PLATFORM_CXXFLAGS+=("-D__SSE4_2__"); SUNCC_XARCH=ssse4_2; fi
if [[ ("$SUNCC_513_OR_ABOVE" -ne "0") ]]; then
if [[ ($(echo -n "$X86_CPU_FLAGS" | "$GREP" -c "aes") -ne "0") ]]; then PLATFORM_CXXFLAGS+=("-D__AES__"); SUNCC_XARCH=aes; fi
if [[ ($(echo -n "$X86_CPU_FLAGS" | "$GREP" -c "pclmulqdq") -ne "0") ]]; then PLATFORM_CXXFLAGS+=("-D__PCLMUL__"); SUNCC_XARCH=aes; fi
if [[ ($(echo -n "$X86_CPU_FLAGS" | "$GREP" -c "rdrand") -ne "0") ]]; then PLATFORM_CXXFLAGS+=("-D__RDRND__"); SUNCC_XARCH=avx_i; fi
if [[ ($(echo -n "$X86_CPU_FLAGS" | "$GREP" -c "rdseed") -ne "0") ]]; then PLATFORM_CXXFLAGS+=("-D__RDSEED__"); SUNCC_XARCH=avx_i; fi
if [[ ($(echo -n "$X86_CPU_FLAGS" | "$GREP" -c "avx") -ne "0") ]]; then PLATFORM_CXXFLAGS+=("-D__AVX__"); SUNCC_XARCH=avx; fi
if [[ ($(echo -n "$X86_CPU_FLAGS" | "$GREP" -c "avx2") -ne "0") ]]; then PLATFORM_CXXFLAGS+=("-D__AVX2__"); SUNCC_XARCH=avx2; fi
if [[ ($(echo -n "$X86_CPU_FLAGS" | "$GREP" -c "bmi") -ne "0") ]]; then PLATFORM_CXXFLAGS+=("-D__BMI__"); SUNCC_XARCH=avx2; fi
if [[ ($(echo -n "$X86_CPU_FLAGS" | "$GREP" -c "bmi2") -ne "0") ]]; then PLATFORM_CXXFLAGS+=("-D__BMI2__"); SUNCC_XARCH=avx2; fi
if [[ ($(echo -n "$X86_CPU_FLAGS" | "$GREP" -c "adx") -ne "0") ]]; then PLATFORM_CXXFLAGS+=("-D__ADX__"); SUNCC_XARCH=avx2_i; fi
fi
fi
fi
fi
PLATFORM_CXXFLAGS+=("-xarch=$SUNCC_XARCH")
The gyrations above allow us to do things like this (except we need SSE2 though ADX).
#if (_MSC_VER >= 1700) || defined(__RDRND__)
uint64_t val;
if(_rdrand64_step(&val))
{
// Use RDRAND value
}
#endif
Without the gyrations, we continually crash the 12.1 through 12.3 compilers during testing with the inline assembly and intrinsics.
The result of running the script gives us the recipe for CFLAGS
and CXXFLAGS
. Below is from a 4th gen Core i5. XEON's produce different results, as does a 5th gen Core i5. For example, a 5th gen Core i5 will have ADX
and use -xarch=avx_i
.
Pathname: /opt/solstudio12.2/bin/CC (symlinked)
CXXFLAGS: -D__SSE2__ -D__SSE3__ -D__SSSE3__ -xarch=ssse3
/opt/solarisstudio12.3/bin/CC (symlinked)
CXXFLAGS: -D__SSE2__ -D__SSE3__ -D__SSSE3__ -D__SSE4_1__ -D__SSE4_2__ -xarch=ssse4_2
Pathname: /opt/solarisstudio12.4/bin/CC
CXXFLAGS: -D__SSE2__ -D__SSE3__ -D__SSSE3__ -D__SSE4_1__ -D__SSE4_2__ -D__AES__ -D__PCLMUL__ -D__RDRND__ -D__AVX__ -xarch=avx
...