I'd like to parallelize the following piece of code but am new to openmp and creating parallel code.
std::vector<DMatch> good_matches;
for (int i = 0; i < descriptors_A.rows; i++) {
if (matches_RM[i].distance < 3 * min_dist) {
good_matches.push_back(matches_RM[i]);
}
}
I have tried
std::vector<DMatch> good_matches;
#pragma omp parallel for
for (int i = 0; i < descriptors_A.rows; i++) {
if (matches_RM[i].distance < 3 * min_dist) {
good_matches[i] = matches_RM[i];
}
}
and
std::vector<DMatch> good_matches;
cv::DMatch temp;
#pragma omp parallel for
for (int i = 0; i < descriptors_A.rows; i++) {
if (matches_RM[i].distance < 3 * min_dist) {
temp = matches_RM[i];
good_matches[i] = temp;
// AND ALSO good_matches.push_back(temp);
}
I have also tried
#omp parallel critical
good_matches.push_back(matches_RM[i]);
This clause works but does not speed anything up. It may be the case that this for loop cannot be sped up but it'd be great if it can be. I'd also like to speed this up as well
std::vector<Point2f> obj, scene;
for (int i = 0; i < good_matches.size(); i++) {
obj.push_back(keypoints_A[good_matches[i].queryIdx].pt);
scene.push_back(keypoints_B[good_matches[i].trainIdx].pt);
}
Apologies if this question as been answered and thank you very much to anyone who can help.
One possibility may be to use private vectors for each thread and combine them in the end:
The actual speed-up depends only on the amount of work done inside each loop.
TBB's
concurrent_vector
acts much likestd::vector
, but allows parallel calls topush_back
.I showed how to do this here c-openmp-parallel-for-loop-alternatives-to-stdvector
Make private versions of the std::vector and fill the shared std::vector in a critical section like this: