Hashing Similarity

2019-03-27 14:04发布

Normally, the goal of hashing is to turn a continuous function into a discrete one: a small change in the input should cause a large change in the output. However, is there any hashing algorithm that will, (very) roughly speaking, return similar but (still different) hashes for similar inputs?

(An example of the use of this would be to check whether two files are "similar" by checking their hashes for similarity. Of course, some failure is always acceptable.)

标签: hash
2条回答
狗以群分
2楼-- · 2019-03-27 14:37

Given a distance function that tells you how similar or different are your objects, you can also employ distance permutations: http://www.computer.org/portal/web/csdl/doi/10.1109/TPAMI.2007.70815 or sketches: http://portal.acm.org/citation.cfm?id=1638180

For an implementation of the latter approach: http://obsearch.net

查看更多
Emotional °昔
3楼-- · 2019-03-27 14:59

Look at Locality Sensitive Hashing (LSH). That is a probabilistic way of quickly finding a bunch of points near a given one, for example.

查看更多
登录 后发表回答