How can I best “parallelise” a set of four nested

2020-07-17 14:30发布

I have the following homework task:
I need to brute force 4-char passphrase with the following mask

%%@@

( where @ - is a numeric character, % - is an alpha character )

in several threads using OpenMP.

Here is a piece of code, but I'm not sure if it is doing the right thing:

int i, j, m, n;

const char alph[26] = "abcdefghijklmnopqrstuvwxyz";
const char num[10] = "0123456789";

#pragma omp parallel for private(pass) schedule(dynamic) collapse(4)
for (i = 0; i < 26; i++)
    for (j = 0; j < 26; j++)
        for (m = 0; m < 10; m++)
            for (n = 0; n < 10; n++) {
                pass[0] = alph[i];
                pass[1] = alph[j];
                pass[2] = num[m];
                pass[3] = num[n];

                /* Working with pass here */

            }

So my question is :
How to correctly specify the "parallel for" instruction, in order to split the range of passphrases between several cores?

Help is much appreciated.

1条回答
别忘想泡老子
2楼-- · 2020-07-17 14:46

Your code is pretty much right, except for using alph instead of num. If you're able to define the pass variable within the loop, that'll save you many a headache.

A full MWE might look like:

//Compile with, e.g.: gcc -O3 temp.c -std=c99 -fopenmp
#include <stdio.h>
#include <unistd.h>
#include <string.h>

int PassCheck(char *pass){
  usleep(50); //Sleep for 100 microseconds to simulate work
  return strncmp(pass, "qr34", 4)==0;
}

int main(){
  const char alph[27] = "abcdefghijklmnopqrstuvwxyz";
  const char num[11]  = "0123456789";
  char goodpass[5] = "----"; //Provide a default password to indicate an error state

  int i, j, m, n;

  #pragma omp parallel for collapse(4)
  for (i = 0; i < 26; i++)
  for (j = 0; j < 26; j++)
  for (m = 0; m < 10; m++)
  for (n = 0; n < 10; n++){
    char pass[4];
    pass[0] = alph[i];
    pass[1] = alph[j];
    pass[2] = num[m];
    pass[3] = num[n];
    if(PassCheck(pass)){
      //It is good practice to use `critical` here in case two
      //passwords are somehow both valid. This won't arise in
      //your code, but is worth thinking about.
      #pragma omp critical
      {
        memcpy(goodpass, pass, 4);
        goodpass[4] = '\0';
        //#pragma omp cancel for //Escape for loops!
      }
    }
  }

  printf("Password was '%s'.\n",goodpass);

  return 0;
}

Dynamic scheduling

Using a dynamic schedule here is probably pointless. Your expectation should be that each password will take, on average, about the same amount of time to check. Therefore, each iteration of the loop will take about the same amount of time. Therefore, there is no need to use dynamic scheduling because your loops will remain evenly distributed.

Visual noise

Note that the loop nest is stacked, rather than indented. You'll often see this in code where there are many nested loops as it tends to reduce visual noise.

Breaking early

#pragma omp cancel for is available as of OpenMP 4.0; however, I got a warning using it in this context, so I've commented it out. If you are able to get it working, that'll reduce your run-time by half since all effort is wasted once the correct password has been found and the password will, on average, be located half-way through the search space.

Where the guessed password is generated

One of the commentors suggests moving, e.g. pass[0] so that it is not in the innermost loop. This is a bad idea as doing so will prevent you from using collapse(4). As a result you could parallelize the outer loop, but you run the risk that its iteration count cannot be evenly divided by the number of threads, resulting in a large load imbalance. Alternatively, you could parallelize the inner loop, which exposes you to the same problem plus high synchronization costs each time the loop ends.

Why usleep?

The usleep function causes the code to run slowly. This is intentional; it provides feedback on the effect of parallelism, since the workload is so small.

If I remove the usleep, then the code completes in 0.003s on a single core and 0.004s on 4 cores. You cannot tell that the parallelism is even working. Leaving usleep in gives 8.950s on a single core and 2.257s on 4 cores, an apt demonstration of the effectiveness of the parallelism.

Naturally, you would remove this line once you're sure that parallelism is working correctly.

Further, any actual brute-force password cracker would likely be computing an expensive hash function inside the PassCheck function. Including usleep() here allows us to simulate that function and experiment with high-level design without having to the function first.

查看更多
登录 后发表回答