I have recently started looking into opportunities to speed up my code by using vector instructions. My code heavily relies on operations with sets - for simplicity let us assume that these are represented as sorted arrays of 16bit unsigned integers. The operations I need to perform are:
- Intersection (i.e., each element contained in both sets is to be present in the output set)
- Union (i.e., each element that is contained in at least one of the sets is to be present in the output set exactly once)
- Merge (works exactly as merge in merge-sort, the output is a sorted "concatenation" of both arrays)
I have found works focusing on vectorizing intersection (e.g., http://www.adms-conf.org/p1-SCHLEGEL.pdf) but I had little luck with finding anything for the other two operations. I am curious, is it possible to speed up also the other two operations by vectorization? (I am targeting CPUs either with AVX or AVX2 instruction sets.)