calculating which strings will have the same hash

with SHA-1 is it possible to figure out which finite strings will render equal hashes?

回答1:

What you are looking for is the solution to the Collision Problem (See also collision attack). A well-designed and powerful cryptographic hash function is designed with the intent of as much obfuscating mathematics as possible to make this problem as hard as possible.

In fact, one of the measures of a good hash function is the difficulty of finding collisions. (Among the other measures, the difficulty of reversing the hash function)

It should be noted that, in hashes where the input is any length of string and the output is a fixed-length string, the Pigeonhole Principle ensures that there is at least one collision for any given string. However, finding this string is not as easy, as it would require basically blind guess-and-check over a basically infinite collection of strings.

It might be useful to read into the the ideal hash functions. Hash functions are designed to be functions where

Small changes in the input cause radical, chaotic changes in the output
Collisions are reduced to a minimum
It is difficult or, ideally, impossible to reverse
There are no hashed values that are impossible to obtain with any inputs (this one matters significantly less for cryptographic purposes)

The theoretical "perfect" hash algorithm would be a "random oracle" -- that is, for every input, it outputs a perfectly random output, on the condition that for the same input, the output will be identical (this condition is fulfilled with magic, by the hand of Zeus and pixie fairies, or in a way that no human could ever possibly understand or figure out)

Unfortunately, this is pretty much impossible, and ultimately, all hashes are judged as "strong" based on how much of these qualities they possess, and to what degree.

A hash like SHA1 or MD5 is going to be pretty strong, and more or less computationally impossible to find collisions for (within a reasonable time frame). Ultimately, you don't need to find a hash that is impossible to find collisions for. You only practically need one where the difficulty of it is large enough that it'd be too expensive to compute (ie, on the order of a billion or a trillion years to find a collision)

Due to all hashes being imperfect, one could analyze the internal workings of it and see mathematical patterns and heuristics and try to find collisions along that pattern. This is similar to a hash function being %7...Hashing the number 13 would be 13%7 = 6, 89%7 = 5. If you saw a hash of 3, you could use your mathematical understanding of the modulus function to easily find a collision (ie, 10)¹. Fortunately for us, stronger hash functions have much, much, much harder to understand mathematical basis. (Ideally, so hard that no human would ever understand it!)

Some figures:

Finding a collision for a single given SHA-0 hash takes about 13 full days of running computations on the top supercomputers in the world, using the patterns inherent in the math.
According to a helpful commenter, MD5 collisions can be generated "quickly" enough to be less than ideal for sensitive purposes.
No feasible or practical/usable collision finding method for SHA-1 has been found or proven so far, although, as pointed out in the comments, there are some weaknesses that have been discovered.

Here is a similar SO question, which has answers much wiser than mine.

¹_{note that, while this hash function is weak for collisions, it is strong it that it is perfectly impossible to go backwards and find a given key, if your hash is, say, 4. There are an infinite amount (ie, 4, 11, 18, 25...)}

回答2:

The answer is pretty clearly yes, since at the very least you could run through every possible string of the given length, compute the hashes of all of them, and then see which are the same. The more interesting question is how to do it quickly.

Further reading: http://en.wikipedia.org/wiki/Collision_attack

回答3:

It depends on the hash function. With a simple hash function, it may be possible. For example, if the hash function simply sums the ASCII byte values of a string, then one could enumerate all strings of a given length that produce a given hash value. If the hash function is more complex and "cryptographically strong" (e.g., MD5 or SHA1), then it is theoretically not possible.

回答4:

Most hashes are of cryptographic or near-cryptographic strength, so the hash depends on the input in non-obvious ways. The way this is done professionally is with rainbow tables, which are precomputed tables of inputs and their hashes. So brute force checking is basically the only way.