Comparison between OpenMP and Vectorization

2019-05-21 12:53发布

问题:

Given an example function (example is given below), the for loop can either be parallelized using OpenMP or be vectorized using vectorization (assuming that compiler does the vectorization).

Example

void function(float* a, float* b, float* c, int n)
{
      for(int i = 0; i < n; i++)
      {
          c[i] = a[i] * b[i];
      }
}

I would like to know

  1. Whether there will be any difference in performance between OpenMP and Vectorization
  2. Is there any advantage in using one over the other.
  3. Is there any possibility of using both OpenMP and vectorization together.

Note: I didn't give a though about the different SSE versions, number of processors/cores (as number of threads scales up in OpenMP), etc... My question is in general. The answers can also be more specific as well.

回答1:

OpenMP and vectorisation are not competing technologies but rather they augment one another. Vectorisation can improve the serial performance of CPU cores that have vector capabilities (SSE/3DNow!/Altivec/etc.) and thus make each thread run faster, while OpenMP can employ more than one of the available cores in order to run multiple threads in order to solve a larger problem in parallel.

In summary:

  • a vectorised serial application usually runs faster than a non-vectorised serial application;
  • an non-vectorised OpenMP application usually runs faster (if correctly written and if the algorithm allows parallelisation) than a non-vectorised serial application;
  • a vectorised OpenMP application usually runs faster than a non-vectorised OpenMP application that usually runs faster than a non-vectorised serial application.

Vectorisation is only data parallel (apply the same operation to multiple data items) and works on the lowest hardware level possible (core/ALU), while OpenMP can be both data and/or task parallel and is an abstraction on much higher level.

As always there is the "it depends" argument, since the performance of vectorisation or OpenMP or vectorisation+OpenMP could depend on the hardware, memory bandwidth, cache usage, etc., etc., etc...

Concerning your case function, it depends on how large the vectors are. If they are too small, using OpenMP would give no benefit, and even lead to slower execution because of the overhead. Vectorisation is likely to improve the execution time.



回答2:

  1. Yes.
  2. Measure, don't argue.
  3. Yes.