I've read already this question, but unfortunately it didn't help.
What I don't understand is what we do once we understood which bucket assign to our high-dimensional space query vector q
: suppose that using our set of locality sensitive family functions h_1,h_2,...,h_n
we have translated q
to a low-dimension (n
dimensions) hash code c
.
Then c
is the index of the bucket which q
is assigned to and where (hopefully) are assigned also its nearest neighbors, let say that there are 100 vectors.
Now, what we do in order to find the q
's NN is to compute the distance between q
and only these 100 vectors, is that correct? So the use of c
is only for indexing (it's used only for deciding which bucket assign q
to`), right?
Another solution, as described in this survey (section 2.2), an alternative to "hash table lookup" (the previously described approach) is "fast distance approximation", so we perform an exhaustive research where we compute the distance between the c
and the generated hash code relative to each element in the dataset. This is supposed to be fast since the hash code are in a low-dimension space AND the distance is supposed to be faster to compute (for example if the hash code space is binary, then we can use the XOR operator for quickly computing the hamming distance between two hash codes).
Now, what I wonder is: what are the advantages/disadvantages of the two methods? Why should I use one approach instead of the other one?