Why does the compiler ignore OpenMP pragmas?

2019-08-01 09:51发布

问题:

In the following C code I am using OpenMP in a nested loop. Since race condition occurs, I want to perform atomic operations at the end:

double mysumallatomic() {

  double S2 = 0.;
  #pragma omp parallel for shared(S2)
  for(int a=0; a<128; a++){
    for(int b=0; b<128;b++){
      double myterm = (double)a*b;
      #pragma omp atomic
      S2 += myterm;
    }
  }
  return S2;
}

The thing is that #pragma omp atomic has no effect on the program behaviour, even if I remove it, nothing happens. Even if I change it to #pragma oh_my_god, I get no error!

I wonder what is going wrong here, whether I can tell the compiler to be more strict when checking omp pragmas or why I do not get an error when I make the last change

PS: For compilation I use:

gcc-4.2 -fopenmp main.c functions.c -o main_elec_gcc.exe

PS2: New code that gives me the same problem and based on gillespie idea:

#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <omp.h>
#include <math.h>

#define NRACK 64
#define NSTARS 1024

double mysumallatomic_serial(float rocks[NRACK][3], float moon[NSTARS][3],
                             float qr[NRACK],float ql[NSTARS]) {
  int j,i;
  float temp_div=0.,temp_sqrt=0.;
  float difx,dify,difz;
  float mod2x, mod2y, mod2z;
  double S2 = 0.;

  for(j=0; j<NRACK; j++){
    for(i=0; i<NSTARS;i++){     
      difx=rocks[j][0]-moon[i][0];
      dify=rocks[j][1]-moon[i][1];
      difz=rocks[j][2]-moon[i][2];
      mod2x=difx*difx;
      mod2y=dify*dify;
      mod2z=difz*difz;
      temp_sqrt=sqrt(mod2x+mod2y+mod2z);
      temp_div=1/temp_sqrt;
      S2 += ql[i]*temp_div*qr[j];       
    }
  }
  return S2;
}

double mysumallatomic(float rocks[NRACK][3], float moon[NSTARS][3], 
                      float qr[NRACK],float ql[NSTARS]) {
  float temp_div=0.,temp_sqrt=0.;
  float difx,dify,difz;
  float mod2x, mod2y, mod2z;
  double S2 = 0.;

  #pragma omp parallel for shared(S2)
  for(int j=0; j<NRACK; j++){
    for(int i=0; i<NSTARS;i++){
      difx=rocks[j][0]-moon[i][0];
      dify=rocks[j][1]-moon[i][1];
      difz=rocks[j][2]-moon[i][2];
      mod2x=difx*difx;
      mod2y=dify*dify;
      mod2z=difz*difz;
      temp_sqrt=sqrt(mod2x+mod2y+mod2z);
      temp_div=1/temp_sqrt;
      float myterm=ql[i]*temp_div*qr[j];    
      #pragma omp atomic
      S2 += myterm;
    }
  }
  return S2;
}
int main(int argc, char *argv[]) {
  float rocks[NRACK][3], moon[NSTARS][3];
  float qr[NRACK], ql[NSTARS];
  int i,j;

  for(j=0;j<NRACK;j++){
    rocks[j][0]=j;
    rocks[j][1]=j+1;
    rocks[j][2]=j+2;
    qr[j] = j*1e-4+1e-3;
    //qr[j] = 1;
  }

  for(i=0;i<NSTARS;i++){
    moon[i][0]=12000+i;
    moon[i][1]=12000+i+1;
    moon[i][2]=12000+i+2;
    ql[i] = i*1e-3 +1e-2 ;
    //ql[i] = 1 ;
  }
  printf(" serial: %f\n", mysumallatomic_serial(rocks,moon,qr,ql));
  printf(" openmp: %f\n", mysumallatomic(rocks,moon,qr,ql));
  return(0);
}

回答1:

  1. Using the flag -Wall highlights pragma errors. For example, when I misspell atomic I get the following warning.

    main.c:15: warning: ignoring #pragma omp atomic1

  2. I'm sure you know, but just in case, your example should be handled with a reduction

  3. When you use omp parallel, the default is for all variables to be shared. This is not what you want in your case. For example, each thread will have a different value difx. Instead, your loop should be:

    #pragma omp parallel for default(none),\
    private(difx, dify, difz, mod2x, mod2y, mod2z, temp_sqrt, temp_div, i, j),\
    shared(rocks, moon, ql, qr), reduction(+:S2)
    for(j=0; j<NRACK; j++){
      for(i=0; i<NSTARS;i++){
        difx=rocks[j][0]-moon[i][0];
        dify=rocks[j][1]-moon[i][1];
        difz=rocks[j][2]-moon[i][2];
        mod2x=difx*difx;
        mod2y=dify*dify;
        mod2z=difz*difz;
        temp_sqrt=sqrt(mod2x+mod2y+mod2z);
        temp_div=1/temp_sqrt;
        S2 += ql[i]*temp_div*qr[j];  
      }
    }
    


回答2:

I know this is an old post, but I think the problem is the order of the parameters of gcc, -fopenmp should be at the end of the compilation line.



回答3:

First, depending on the implementation, reduction might be better than using atomic. I would try both and time them to see for sure.

Second, if you leave off the atomic, you may or may not see the problem (wrong result) associated with the race. It is all about timing, which from one run to the next can be quite different. I have seen cases where the result was wrong only once in 150,000 runs and others where it has been wrong all the time.

Third, the idea behind pragmas was that the user doesn't need to know about them if they don't have an effect. Besides that, the philosophy in Unix (and its derivatives) is that it is quiet unless there is a problem. Saying that, many implementations have some sort of flag so the user can get more information because they didn't know what was happening. You can try -Wall with gcc, and at least it should flag the oh_my_god pragma as being ignored.



回答4:

You have

#pragma omp parallel for shared(S2)
  for(int a=0; a<128; a++){
   ....

So the only parallelization will be to the for loop.

If you want to have the atomic or reduction you have to do

#pragma omp parallel 
{
 #pragma omp for shared(S2)
   for(int a=0; a<128; a++){
     for(int b=0; b<128;b++){
       double myterm = (double)a*b;
       #pragma omp atomic
        S2 += myterm;
     } // end of second for
   } // end of 1st for
} // end of parallel code
return S2;
} // end of function

Otherwise everything after # will be comment