Hash function on list independant of order of item

I want to have a dictionary that assigns a value to a set of integers.

For example key is [1 2 3] and value will have certain value.

The thing is that [3 2 1] needs to be treated the same in my case so hash needs to be equal, if I go with hash approach.

The set will have 2 to 10 items.

Sum of items is usually fixed so we cannot make hashcode according to sum, which is a first natural idea here.

Not a homework task, actually facing this problem in my code.

This set is basically IEnumerable<int> in C# so any data structure is fine to store them.

Any help appreciated. Performance is pretty important here too.

An immediate thought: we could sum up items^2 and already get some kind of better hash, but still I would like to hear some thoughts.

EDIT: hmm really sorry guys, everyone suggests ordering, didn't come to my mind that I needed to say that actually ordering and hashing is the current solution I use and I am considering faster alternatives.

标签： c# algorithm hash-function

9条回答

ら.Afraid

2楼-- · 2019-02-23 15:11

Use a HashSet<T> and HashSet<T>.CreateSetComparer(), which returns an IEqualityComparer<HashSet<T>>.

0人赞添加讨论(0) 举报

唯我独甜

3楼-- · 2019-02-23 15:12

I think what is mentioned in this paper would definitely help:

http://people.csail.mit.edu/devadas/pubs/mhashes.pdf

Incremental Multiset Hash Functions and Their Application to Memory Integrity Checking

Abstract: We introduce a new cryptographic tool: multiset hash functions. Unlike standard hash functions which take strings as input, multiset hash functions operate on multisets (or sets). They map multisets of arbitrary ﬁnite size to strings (hashes) of ﬁxed length. They are incremental in that, when new members are added to the multiset, the hash can be updated in time proportional to the change. The functions may be multiset-collision resistant in that it is diﬃcult to ﬁnd two multisets which produce the same hash, or just set-collision resistant in that it is diﬃcult to ﬁnd a set and a multiset which produce the same hash.

0人赞添加讨论(0) 举报

家丑人穷心不美

4楼-- · 2019-02-23 15:18

Create your own type that implements IEnumerable<T>.

Override GetHashCode. In it, sort your collection, call and return ToArray().GetHashCode().

0人赞添加讨论(0) 举报

Emotional °昔

5楼-- · 2019-02-23 15:19

If the range of the values in key happens to be limited to low-ish positive integers, you could map each one to a prime number using a simple lookup, then multiply them together to arrive at the value.

Using the example in the question:

[1, 2, 3] maps to 2 x 3 x 5 = 30
[3, 2, 1] maps to 5 x 3 x 2 = 30

0人赞添加讨论(0) 举报

相关推荐>>

6楼-- · 2019-02-23 15:20

Why not something like

public int GetOrderIndependantHashCode(IEnumerable<int> source)
{
    return (source.Select(x => x*x).Sum()
            + source.Select(x => x*x*x).Sum()
            + source.Select(x => x*x*x*x).Sum()) & 0x7FFFFF;
}

0人赞添加讨论(0) 举报

Lonely孤独者°

7楼-- · 2019-02-23 15:25

Basically all of the approaches here are instantiations of the same template. Map x₁, …, x_n to f(x₁) op … op f(x_n), where op is a commutative associative operation on some set X, and f is a map from items to X. This template has been used a couple of times in ways that are provably good.

Choose a random large prime p and a random residue b in [1, p - 1]. Let f(x) = b^x mod p and let op be addition. We essentially interpret a set as a polynomial and use the Schwartz–Zippel lemma to bound the probability of a collision (= the probability that a nonzero polynomial has b as a root mod p).
Let op be XOR and let f be a randomly chosen table. This is Zobrist hashing and minimizes in expectation the number of collisions by straightforward linear-algebraic arguments.

Modular exponentiation is slow, so don't use it. As for Zobrist hashing, with 3 million items, the table f probably won't fit into L2, though it does set an upper bound of one main-memory access.

I would instead take Zobrist hashing as a departure point and look for a cheap function f that behaves like a random function. This is essentially the job description of a non-cryptographic pseudorandom generator – I would try computing f by seeding a fast PRG with x and generating one value.

EDIT: given that the sets all have the same sums, don't choose f to be a degree 1 polynomial (e.g., the step function of a linear congruential generator).

0人赞添加讨论(0) 举报

1 2 下一页

Hash function on list independant of order of item

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间