I am studying FLANN, a library for approximate nearest neighbors search.
For the LSH method they represent an object (point in search space), as an array of unsigned int. I am not sure why they do this, and not represent a point simply as a double array (which would represent a point in multi-dimensional vector space). Maybe because LSH is used for binary features? Can someone share more about the possible use of unsigned int in this case? Why unsigned int if you only need a 0 and 1 for each feature?
Thanks
Please note that I will refer to the latest FLANN release, i.e.
flann-1.8.3
at the time of writing.No: this is wrong. The
LshIndex
class includes abuildIndexImpl
method that implements the LSH indexing. Since the LSH is basically a collection of hash tables, the effective indexing occurs on theLshTable
class.The elementary indexing method, i.e. the method that indexes one feature vector (aka descriptor, or point) at a time is:
Note: the
buildIndexImpl
method uses the alternative version that simply iterates over the features, and call the above method on each.As you can see this method has 2 arguments which is a pair
(ID, descriptor)
:value
which isunsigned int
represents the feature vector unique numerical identifier (aka feature index)feature
represents the feature vector itselfIf you look at the implementation you can see that the first step consists in hashing the descriptor value to obtain the related bucket key (= the identifier of the slot pointing to the bucket in which this descriptor ID will be stored):
In practice the
getKey
hashing function is only implemented for binary descriptors, i.e. descriptors that can be represented as an array ofunsigned char
:Yes: as stated above, the FLANN LSH implementation works in the Hamming space for binary descriptors.
If you were to use descriptors with real values (in
R**d
) you should refer to the original paper that includes details about how to convert the feature vectors into binary strings so as to use the Hamming space and hash functions.See above: the
unsigned int
value is only used to store the related ID of each feature vector.