Efficient way for SIFT descriptor matching

2019-03-29 02:16发布

问题:

There are 2 images A and B. I extract the keypoints (a[i] and b[i]) from them.
I wonder how can I determine the matching between a[i] and b[j], efficiently?

The obvious method comes to me is to compare each point in A with each point in B. But it over time-consuming for large images databases. How can I just compare point a[i] with just b[k] where k is of small range?

I heard that kd-tree may be a good choice, isn't it? Is there any good examples about kd-tree?

Any other suggestions?

回答1:

KD tree stores the trained descriptors in a way that it is really faster to find the most similar descriptor when performing the matching.

With OpenCV it is really easy to use kd-tree, I will give you an example for the flann matcher:

flann::GenericIndex< cvflann::L2<int> >  *tree; // the flann searching tree
tree = new flann::GenericIndex< cvflann::L2<int> >(descriptors, cvflann::KDTreeIndexParams(4)); // a 4 k-d tree

Then, when you do the matching:

const cvflann::SearchParams params(32);
tree.knnSearch(queryDescriptors, indices, dists, 2, cvflann::SearchParams(8));


回答2:

The question is weather you actually want to determine a keypoint matching between two images, or calculate a similarity measure.

If you want to determine a matching, then I'm afraid you will have to brute-force search through all possible descriptor pairs between two images (there is some more advanced methods such as FLANN - Fast Approximate Nearest Neighbor Search, but the speedup is not significant if you have less then or around 2000 keypoints per image -- at least in my experience). To get a more accurate matching (not faster, just better matches), I can suggest you take look at:

  • D.G. Lowe. Distinctive image features from scale-invariant keypoints -- the comparison with the second closest match
  • J. Sivic and A. Zisserman. Video Google: A text retrieval approach to object matching in videos -- the section about Spatial consistency

If, on the other hand, you want only a similarity measure over a large database, then the appropriate place to start would be:

  • D. Nistér and H. Stewénius. Scalable recognition with a vocabulary tree -- where they use a hierarchical approach based on a structure called vocabulary tree to be able to calculate a similarity measure between a query image and an image from a large database
  • J. Sivic and A. Zisserman. Video Google: A text retrieval approach to object matching in videos -- the same paper as above, but it's very helpful to understan the approach in Nistér, Stewénius


回答3:

In OpenCV there are several strategies implemented to match sets of keypoints. Have a look at documentation about Common Interfaces of Descriptor Matchers.