I am looking at the intel intrinsic guide:
https://software.intel.com/sites/landingpage/IntrinsicsGuide/
and whilst they have _mm_dp_ps
and _mm_dp_pd
for calculating the dot product for floats and doubles I cannot see anything for calculating the integer dot product.
I have two unsigned int[8]
arrays and I would like to:
(a[0] x b[0]) + (a[1] * b[1])....... + (a[num_elements_in_array-1] * b[num_elements_in_array-1])
(in batches of four) and sum the products?
Every time someone does this:
.. a puppy dies.
Use one of these:
Cast
x
as necessary.There is no integer version of
_mm_dp_ps
. But you can do what you were about to do: multiply 4 by 4 integers, accumulate the sum of the products.So something like this (not tested, doesn't compile)
As discussed in the comments and chat, that reorders the sums in such a way as to minimize the number of horizontal sums required, by doing most sums vertically.