gsl openmp failed integration

2019-08-21 02:40发布

This is my first post on here, so go easy on me! I have a very strange problem. I've written a c code that converts particle data to grid data (the data comes from a cosmological simulation). In order to do this conversion, I am using the gsl Monte Carlo vegas integrator. When I run it in serial, it runs just fine and gives me the correct answer (albeit slowly). As an attempt at speeding it up, I tried openmp. The problem is that when I run it in parallel, the integration times out (I set a MAX_ITER variable in the integration loop to avoid an infinite loop due to lack of convergence). The random number generator is set and initialized before the parallel block of code. I checked and double checked, and all of the data about the particle that it fails at in parallel (x, y, and z position and integration bounds being passed to the integrator) are the same in both serial and parallel. I also tried increasing my MAX_ITER variable from 100 to 1000, but that did nothing; it just took longer to fail.

My question then, is if anyone has any idea why the code would run in serial, but time out in parallel when using the exact same particles?

Also, in case you want it, the numbers for the offending particle are: x = 0.630278, y = 24.952896, z = 3.256376, h = 3 (this is the smoothing length of the particle, which serves to "smear" out the particle's mass, as the goal of the simulation is to use particles to sample a fluid. This is the sph method), x integration bounds (lower, upper) = {0, 630278}, y bounds = {21.952896, 27.952896}, and z bounds = {0.256376, 6.256375}

The idea behind the conversion is that the particle's mass is contained within a "smoothing sphere" of radius h and centered at the particle itself. This mass is not distributed uniformly, but is done so according to the sph kernel (this is the function I'm integrating). Thus, depending on how the sphere is placed within its "home cell", only a part of the sphere may actually be inside this cell. The goal then is to get the appropriate bounds of integration and pass them to the integrator. The integrator (the code is below) has a check, whereby if the point given to it by the Monte Carlo integrator lies outside the sphere, it returns 0 (this is because getting the exact limits of integration for every possible case is a huge pain).

The code for my loop is here:

// Loop over every particle
#pragma omp parallel shared(M_P, m_header, NumPartTot, num_grid_elements, cxbounds,
cybounds, czbounds, master_cell) private(index, x, y, z, i, j, k, h, tid, cell,
corners) 
{
tid = omp_get_thread_num();

  // Set up cell struct. Allocate memory!
  cell = set_up_cell();

  #pragma omp for
  for(index = 1; index <= NumPartTot; index++)
  {    
     printf("\n\n\n************************************\n");
     printf("Running particle: %d on thread: %d\n", index, tid);
     printf("x = %f  y = %f   z = %f\n", M_P[index].Pos[0], M_P[index].Pos[1], M_P[index].Pos[2]);
     printf("**************************************\n\n\n");
     fflush(stdout);
     // Set up convenience variables
     x = M_P[index].Pos[0];
     y = M_P[index].Pos[1];
     z = M_P[index].Pos[2];

     // Figure out which cell the particle is in
     i = (int)((x / m_header.BoxSize) * num_grid_elements);
     j = (int)((y / m_header.BoxSize) * num_grid_elements);
     k = (int)((z / m_header.BoxSize) * num_grid_elements);

     corners = get_corners(i, j, k);

     // Check to see what type of particle we're dealing with
     if(M_P[index].Type == 0)
     {    
        h = M_P[index].hsml;
        convert_gas(i, j, k, x, y, z, h, index, cell, corners);
     }    

     else 
     {    
        update_cell_non_gas_properties(index, i, j, k, cell);
     }    
  }    

  // Copy each thread's version of cell to cell_master
  #ifdef _OPENMP
     copy_to_master_cell(cell);
     free_cell(cell);
  #endif
} /*-- End of parallel region --*/

The problem occurs in the function convert_gas. The problematic section is here (in the home cell block):

 // Case 6: Left face
if(((x + h) < cxbounds[i][j][k].hi) && ((x - h) < cxbounds[i][j][k].lo) &&
((y + h) < cybounds[i][j][k].hi) && ((y - h) >= cybounds[i][j][k].lo) &&
((z + h) < czbounds[i][j][k].hi) && ((z - h) >= czbounds[i][j][k].lo))
{
  printf("Using case 6\n");
  fflush(stdout);

  // Home cell
  ixbounds.lo = cxbounds[i][j][k].lo;
  ixbounds.hi = x + h;
  iybounds.lo = y - h;
  iybounds.hi = y + h;
  izbounds.lo = z - h;
  izbounds.hi = z + h;

  kernel = integrate(ixbounds, iybounds, izbounds, h, index, i, j, k);

  update_cell_gas_properties(kernel, i, j, k, index, cell);

  // Left cell
  ixbounds.lo = x - h;
  ixbounds.hi = cxbounds[i][j][k].lo;
  iybounds.lo = y - h; // Not actual bounds. See note above.
  iybounds.hi = y + h;
  izbounds.lo = z - h;
  izbounds.hi = z + h;

  kernel = integrate(ixbounds, iybounds, izbounds, h, index, i - 1, j, k);

  update_cell_gas_properties(kernel, i - 1, j, k, index, cell);

  return;

}

The data that I'm currently using is test data, so I know exactly where the particles should be and what integration bounds they should have. When using gdb, I find that all of these numbers are correct. The integration loop in the function integrate is here (TOLERANCE is 0.2, WARM_CALLS is 10000, and N_CALLS is 100000):

 gsl_monte_vegas_init(monte_state);
  // Warm up
  gsl_monte_vegas_integrate(&monte_function, lower_bounds, upper_bounds, 3,
                            WARM_CALLS, random_generator, monte_state, &result, &error);

  // Actual integration
  do
  {
     gsl_monte_vegas_integrate(&monte_function, lower_bounds, upper_bounds, 3,
                               N_CALLS, random_generator, monte_state, &result, &error);
     iter++;

  } while(fabs(gsl_monte_vegas_chisq(monte_state) - 1.0) > TOLERANCE && iter < MAX_ITER);



  if(iter >= MAX_ITER)
  {
     fprintf(stdout, "ERROR!!! Max iterations %d exceeded!!!\n"
             "See M_P[%d].id : %d   (%f %f %f)\n"
             "lower bnds : (%f %f %f)   upper bnds : (%f %f %f)\n"
              "trying to integrate in cell %d %d %d\n\n", MAX_ITER, pind, M_P[pind].id,
              M_P[pind].Pos[0], M_P[pind].Pos[1], M_P[pind].Pos[2],
              ixbounds.lo, iybounds.lo, izbounds.lo, ixbounds.hi, iybounds.hi, izbounds.hi, i, j, k);
     fflush(stdout);
     exit(1);
  }

Again, this exact code (but without openmp, I pass that compile time option as an option in the makefile) with the exact same numbers runs in serial, but not in parallel. I'm sure it's something stupid that I've done and simply cannot see at the moment (at least, I hope!) Anyways, thanks for the help in advance!

0条回答
登录 后发表回答