Hash of unique value = unique hash?

2019-04-12 13:08发布

问题:

Theoretically does hashing a unique value yield a unique value?

Let's say I have a DB table with 2 columns: id and code. id is an auto-incrementing int and code is a varchar. If I do ...

$code = sha1($id);

... and then store $code into the same row as $id. Will my code column be unique as well?

What about if I append the current time? eg:

$code = sha1($id . time());

Thanks.

回答1:

In general, the answer is no. This is trivial to show: SHA-1 has 2^160 different outputs - 160 bits, but there are many more inputs that that (e.g., there are 2^320 different 40-byte strings, and they can't all map to a unique output).

Given a sufficient subset of values, the answer is maybe. It depends on the exact algorithm and the size of the subset: if the number of possible inputs is smaller than the number of possible outputs, then it is possible (but NOT guaranteed). When thinking about this, it may be helpful to keep the birthday paradox in mind: the probability of a collision does not increase linearly with the number of inputs.



回答2:

There is a small possibility that two different values give the same hash. Although very small, it's not unlikely.



回答3:

It depends on the hashing algorithm. But theoretically, unless the hash is exactly the same as the original string there is a potential for the hash to not be unique.

A hash of a value is a condensed representation of the original value. By removing pieces of information to create the hash you are losing parts of what make it unique in the domain and therefore increasing the probability that the value will not be unique. The only way to guarantee that it will be unique is to use the original value itself which defeats the purpose of hashing.



回答4:

One has to ask the question, why you would want to do this? If your database is already providing you with a unique identifier why do you need to generate another unique identifier?

You may also wish to consider that outside of PHP many database engines will generate UUID style primary keys for you if that is what you require.

The point here is that hashing algorithms such as sha1() are not intended for this type of work; they are for verifying that two (potentially very long) string inputs are the same. The chance of a collision with a similar, but not exact string is very remote but the chance of a collision with very different strings becomes far higher.