What's the purpose in hashing information?

2019-03-08 21:21发布

问题:

After being taught how to create a hash table in class, I don't understand when hashing data would be useful. It seems to me that all hashing does is storing information in semi-random positions in an array. I want to know how any of the data can be made useful after it's stored.

My question is this: what are some examples where hashing information is beneficial? How is data retrieved in any organized manner? It seems to be placed in arbitrary positions where it would be difficult to retrieve.

回答1:

Hashing can be used for many purposes:

  1. It can be used to compare large amounts of data. You create the hashes for the data, store the hashes and later if you want to compare the data, you just compare the hashes.

  2. Hashes can be used to index data. They can be used in hash tables to point to the correct row. If you want to quickly find a record, you calculate the hash of the data and directly go to the record where the corresponding hash record is pointing to. (This assumes that you have a sorted list of hashes that point to the actual records)

  3. They can be used in cryptographic applications like digital signatures.

  4. Hashing can be used to generate seemingly random strings.

Here are the applications of hash functions that wikipedia lists:

  1. Finding duplicate records
  2. Finding similar records
  3. Finding similar substrings
  4. Geometric hashing

Now regarding hash table, here are some points to note:

If you are using a hash table, the hashes in the table should be in a sorted manner. If not, you will have to create an index on the hash column. Some implementations store the hash separately in a sorted manner and point to the original record.

If somebody is storing hashes in a semi-random order, it must be either because of the above reasons or because they just want to store the message digest of the information for comparing, finding duplicates etc. and not as an index to the data.



回答2:

One of the major uses of the hash tables you created in class is when you need fast O(1) lookup times. You'll have have two components, keys and the values.

The hash function transforms the key into a hash. That hash is a number, and specifically, it is the index of the data in the array.

So, when you need to look up Agscala's reputation up in a hash table and you have used your username as the key, it takes almost no time to locate and find the relevant value. It simply re-hashes your username and viola, there is the index of the data you were looking for. You didn't have to iterate over the entire array looking for that specific value.

For some reference the Wikipedia page on Hash tables is pretty good.



回答3:

There are a couple of typical reasons to hash data. In the example you reference, you would hash the data and use that as a key to extract the actual value of the hashed item. The hashed data is often referred to as a key and it references a bucket where the actual, non-hashed value can be found.

The other typical reason is to create a signature of the hashed value so that you can check if the value has been changed by someone else. Since it's usually rare, depending on the algorithm used, to have two items hash to the same value, you can rehash a value and compare it to the saved hash value to check if the item is still the same.



回答4:

Hashing is a technique useful for fast key lookup. It allows one to more efficiently find a value rather than scanning a list from beginning to end.



回答5:

Have you ever used a dictionary or a set? They're typically implemented in terms of a hashtable because the value associated with a key can be found quickly.

{
'WA': 'Washington',
'WV': 'West Virginia',
'WY': 'Wyoming'
}


标签: hash