Is there any hash function with uniq hash code (like MD5) with order preserving?
NOTE: i don't care about security, i need it for sorting, i have lot of chunks with (~1MB size) and i want to sort them, of course i can use index sort but i want to reduce time of compare
Theoreticaly: if i have 1'000'000 chunks with 1MB size (1'048'576 byte) and all of them have difference in last 10 bytes then time of compare of one chunk to other will be O(n-10) and if i will use QuictSort (which make ~(nlog2(n)) compares) then total time of compare will be nlog2(n)*(k-10) (where k is chunk size) 1'000'000 * 20 * (1'048'576 - 10)
that's why i want to generate order preserved hash codes with fixed size (for example 16 bytes) once then sort chunks and save result (for example: in file)
According to NIST (I'm no expert) a Pearson hash can be order-preserving. The hash uses an auxiliary table. Such a table can (in theory) be constructed such that the resulting hash is order preserving.
It doesn't meet your full requirements though, because it doesn't reduce the size as you would like. I'm posting this in case other people are looking for a solution.
Some pointers:
CHM (Z.J. Czech, G. Havas, and B.S. Majewski) is an algorithm which generates a minimal perfect hash that preserves ordering (e.g. if A < B, then h(A) < h(B)). It uses approximately 8 bytes of storage per key.
See: http://cmph.sourceforge.net/chm.html
In general case, such a function is impossible unless the size of the hash is at least the size of the object.
The argument is trivial: if there are N objects but M < N hash values, by pigeonhole principle, two different objects are mapped to one hash value, and so their order is not preserved.
If however we have additional properties of the objects guaranteed or the requirements relaxed, a custom or probabilistic solution may become possible.
In theory there is no such thing. If you want, you can create a composed hash:
I think this will resolve your needs.
Sorting an array of
N
strings each of lengthK
can be done in justO (NK)
orO (N^2 + NK)
character comparisons.For example, construct a trie.
Or do a kind of insertion sort. Construct the set of sorted strings
S
by adding strings to it one by one. For each new stringP
, traverse it, maintaining the (non-decreasing) index of the greatest stringQ
inS
such thatQ <= P
. When the stringP
ends, insert it intoS
just afterQ
. Each of theO(N)
insertions can be done inO(N+K)
operations:O(N)
times increasing the index distributed intoK
.When you have indices of the strings in sorted order, just use them for your purposes instead of the "hashes" you want.