Hashing Similarity

2019-03-27 14:01发布

问题:

Normally, the goal of hashing is to turn a continuous function into a discrete one: a small change in the input should cause a large change in the output. However, is there any hashing algorithm that will, (very) roughly speaking, return similar but (still different) hashes for similar inputs?

(An example of the use of this would be to check whether two files are "similar" by checking their hashes for similarity. Of course, some failure is always acceptable.)

回答1:

Look at Locality Sensitive Hashing (LSH). That is a probabilistic way of quickly finding a bunch of points near a given one, for example.



回答2:

Given a distance function that tells you how similar or different are your objects, you can also employ distance permutations: http://www.computer.org/portal/web/csdl/doi/10.1109/TPAMI.2007.70815 or sketches: http://portal.acm.org/citation.cfm?id=1638180

For an implementation of the latter approach: http://obsearch.net



标签: hash