I'm testing on an old PowerMac G5, which is a Power4 machine. The build is failing:
$ make
...
g++ -DNDEBUG -g2 -O3 -mcpu=power4 -maltivec -c ppc-simd.cpp
ppc-crypto.h:36: error: use of 'long long' in AltiVec types is invalid
make: *** [ppc-simd.o] Error 1
The failure is due to:
typedef __vector unsigned long long uint64x2_p8;
I'm having trouble determining when I should make the typedef available. With -mcpu=power4 -maltivec
the machine reports 64-bit availability:
$ gcc -mcpu=power4 -maltivec -dM -E - </dev/null | sort | egrep -i -E 'power|ARCH'
#define _ARCH_PPC 1
#define _ARCH_PPC64 1
#define __POWERPC__ 1
The OpenPOWER | 6.1. Vector Data Types manual has a good information on vector data types, but it does not discuss when the vector long long
are available.
What is the availability of __vector unsigned long long
? When can I use the typedef?
TL:DR: it looks like POWER7 is the minimum requirement for 64-bit element size with AltiVec. This is part of VSX (Vector Scalar Extension), which Wikipedia confirms first appeared in POWER7.
It's very likely that gcc knows what it's doing, and enables 64-bit element-size vector intrinsics with the lowest necessary -mcpu=
requirement.
#include <altivec.h>
auto vec32(void) { // compiles with your options: Power4
return vec_splats((int) 1);
}
// gcc error: use of 'long long' in AltiVec types is invalid without -mvsx
vector long long vec64(void) {
return vec_splats((long long) 1);
}
(With auto
instead of vector long long
, the 2nd function compiles to returning in two 64-bit integer registers.)
Adding -mvsx
lets the 2nd function compile. Using -mcpu=power7
also works, but power6 doesn't.
source + asm on Godbolt (PowerPC64 gcc6.3)
# with auto without VSX:
vec64(): # -O3 -mcpu=power4 -maltivec -mregnames
li %r4,1
li %r3,1
blr
vec64(): # -O3 -mcpu=power7 -maltivec -mregnames
.LCF2:
0: addis 2,12,.TOC.-.LCF2@ha
addi 2,2,.TOC.-.LCF2@l
addis %r9,%r2,.LC0@toc@ha
addi %r9,%r9,.LC0@toc@l # PC-relative addressing for static constant, I think.
lxvd2x %vs34,0,%r9 # vector load?
xxpermdi %vs34,%vs34,%vs34,2
blr
.LC0: # in .rodata
.quad 1
.quad 1
And BTW, vec_splats
(splat scalar) with a constant compiles to a single instruction. But with a runtime variable (e.g. a function arg), it compiles to an integer store / vector load / vector-splat (like the vec_splat
intrinsic). Apparently there isn't a single instruction for int->vec.
The vec_splat_s32
and related intrinsics only accept a small (5-bit) constant, so they only compile in cases where the compiler can use the corresponding splat-immediate instruction.
This Intel SSE to PowerPC AltiVec migration looks mostly good, but got that wrong (it claims that vec_splats
splats a signed byte).