pairwise addition in neon

2019-08-23 18:22发布

问题:

I want to add 00 and 01 indices value of int64x2_t vector in neon . I am not able to find any pairwise-add instruction which will do this functionality .

int64x2_t sum_64_2;
//I am expecting result should be.. 
//int64_t result = sum_64_2[0] + sum_64_2[1];
  • Is there any instruction in neon do to this logic.

回答1:

You can write it in two ways. This one explicitly uses the NEON VADD.I64 instruction:

int64x1_t f(int64x2_t v)
{
  return vadd_s64 (vget_high_s64 (v), vget_low_s64 (v));
}

and the following one relies on the compiler to correctly select between using the NEON and general integer instruction sets. GCC 4.9 does the right thing in this case, but other compilers may not.

int64x1_t g(int64x2_t v)
{
  int64x1_t r;
  r=vset_lane_s64(vgetq_lane_s64(v, 0) + vgetq_lane_s64(v, 1), r, 0);
  return r;
}

When targeting ARM, the code generation is efficient. For AArch64, extra instructions are used, but the compiler could do better.