I am new to GCC's C vector extensions. I am considering use of them in my project, but their utility is (somewhat) contingent on the ability to efficiently move all elements in a vector one position to the left and store the result in a new vector. How can I do this efficiently (such as in a SIMD-accelerated way)?
So, basically:
- OriginalVector = {1, 2, 3, 4, 5, 6, 7, 8}
- ShiftedVector = {2, 3, 4, 5, 6, 7, 8, X} (where X can be anything.)
Background information (you can skip this): The purpose of such a transformation is in dealing with matrices where each row is represented with vectors. Specifically, it would enable one to treat ShiftedVector as the upper-left diagonal for the row beneath, and compare all values in one SIMD operation. If there is another way to compare a vector with another vector offset by one element, that would solve the problem too. But I'm assuming not, and that the most efficient way to perform this comparison is to move all the elements leftward and do the comparison 1:1.
General stipulations:
- The original vector mustn't be harmed in the process
- It is fine if I have to use an x86 intrinsic function of some sort, but I don't know which or how
- It is fine if I lose the left-most element in the vector and introduce gibberish in the right-most
- It is fine if the most efficient method is an unaligned load of the original vector from its second position to end+1, but I still would like to know how to best code this
It seems the bottleneck here is the lack of general information on the process of using the intrinsics. It seems people are either using assembly (which I am no expert in) or auto-vectorization (which doesn't work well here), so vector types are the most logical choice.
Thanks!