Confused about hashes

2019-02-10 14:09发布

say I have a blob of text 5000 characters. I run it through a hashing program and generates a 40 char long hash. now i run another blob of text, 10000 characters. it still generates a hash 40 chars long. that's true for text of any length.

my question is if the hashes are all unique, wouldn't i be able to compress anything into a 40 char string?

标签: hash
12条回答
趁早两清
2楼-- · 2019-02-10 14:40

Hashing is for spreading as good as possible, not for uniqueness!

Of course, reaching uniqueness is a reaching a 100% spreading, but that's often not possible, no matter how good your hashing algorithm.

Striking example :

C# for example provide an Int32 code for each object as HashCode... So also for Int64 :

       Int64 a = Int64.MaxValue;
       Int32 myHash =  a.GetHashCode();

Conclusion here : there are 2^64 different possible instances of Int64, but only 2^32 hashcodes for them!!

So : each hashvalue for an Int64 is shared by (average)

4 294 967 295

other Int64's!

So much for uniqueness hey :-)

查看更多
smile是对你的礼貌
3楼-- · 2019-02-10 14:42

Hashing is not unique.

Hashing is a technique to attempt to generate a unique hash for each value fed to it, but it is not guaranteed unique.

Good hashing algorithms will have duplicate hash values much less frequently than bad hash algorithms. Also, hashing is one directional - meaning you can't go from a hash -> original, so it's not meant for compression.

And: A hash doesn't need to be unique. The same input needs to be tranformed into the same hash by the algorithm. You don't use a hash as identifier!

查看更多
【Aperson】
4楼-- · 2019-02-10 14:46

One way to think of a hash is like a human fingerprint (hashes are also sometimes referred to as fingerprints)..

You can "compress" any person in to a (pretty much) unique finger-print.. but, you cannot know who someone is by their fingerprint alone.. This is just like a hash, you can easily work out hash("abcdef") -> a1b2c3, but given only a1b2c3, you cannot trivially tell the source data.

To reverse a finger print, you need to compare the fingerprint to a database of known people->finger-prints (if the unknown fingerprint matches Person1, the unknown fingerprint belongs to them)

With a hash, again you must do much the same thing - you have a database with all string->hash mappings (called a rainbow table). Then you lookup the row with the hash "a1b2c3" and it shows "abcdef" was hashed in order to get this. The other more common way is to simply try every combination of characters, hash them and compare (a brute force attack)

Finally, while human fingerprints are "unique", it's possible to have two the same, it's just incredibly unlikely - it's the same with hashing... Some hashing algorithms are more susceptible to collisions than others.

my question is if the hashes are all unique, wouldn't i be able to compress anything into a 40 char string?

Theoretically hashing is a great compression method, but to decompress is incredibly impractical beyond (say) 10 ASCII characters of data.. You're right, you can compress anything to a 40 character string, but you cannot decompress it practically (even theoretically is a bit of a stretch..)

查看更多
太酷不给撩
5楼-- · 2019-02-10 14:47

If you are using a well defined hash function correctly you can practically assume that hash results are unique.

Issue, with your question is a hash is a one way function. There is no inverse function to take a value and go back to your original blob. Unless you keep a huge table of all possible original values (so called rainbow table).

查看更多
贼婆χ
6楼-- · 2019-02-10 14:51

RSA hashes are not unique. There's an extremely tiny (something to the order of 1:36^40) chance that you will create a false positive when hashing two different bits of clear text. For most applications, the chance is considered sufficiently small that you can ignore it, since it would take, on average, millions of years to see an accidental collision.

查看更多
地球回转人心会变
7楼-- · 2019-02-10 14:52

Hashing is not guaranteed to be unique, but if you are looking for a unique hash, look at gperf. It can generate a unique hashing function for a set of predetermined inputs.

查看更多
登录 后发表回答