In LSH, you hash slices of the documents into buckets. The idea is that these documents that fell into the same buckets will be potentially similar, thus a nearest neighbor, possibly.
For 40.000 documents, what is a good value (pretty much) for the number of buckets?
I have it as: number_of_buckets = 40.000/4
now, but I feel it can be reduced more.
Any ideas, please?
Relative: How to hash vectors into buckets in Locality Sensitive Hashing (using jaccard distance)?