Bag of feature: how to create the query histogram?

2019-07-24 16:18发布

问题:

I'm trying to implement the Bag of Features model.

Given a descriptors matrix object (representing an image) belonging to the initial dataset, compute its histogram is easy, since we already know to which cluster each descriptor vector belongs to from k-means.

But what about if we want to compute the histogram of a query matrix? The only solution that crosses my mind is to compute the distance between each vector descriptor to each of the k cluster centroids.

This can be inefficient: supposing that k=100 (so 100 centroids), then we have an query image represented through 1000 SIFT descriptors, so a matrix 1000x100.

What we have to do now is computing 1000 * 100 eucledian distances in 128 dimensions. This seems really inefficient.

How to solve this problem?

NOTE: can you suggest me some implementations where this point is explained?

NOTE: I know LSH is a solution (since we are using high-dim vectors), but I don't think that actual implementations use it.

UPDATE:

I was talking with a collegue of mine: using a hierarchical cluster approach instead of classic k-means, should speed up the process so much! Is it correct to say that if we have k centroids, with an hierarchical cluster we have to do only log(k) comparisons in order to find the closest centroid instead of k comparisons?

回答1:

For a bag of features approach, you indeed need to quantize the descriptors. Yes, if you have 10000 features and 100 features that 10000*100 distances (unless you use an index here). Compare this to comparing each of the 10000 features to each of the 10000 features of each image in your database. Does it still sound that bad?