Fastest Hash Algorithm for Text Data

2019-01-17 05:03发布

问题:

I'm trying to choose a hash algorithm for comparing about max 20 different text data.

Which hash is better for these requirements?

  • Less CPU Consumption
  • Small footprint (<=32 bytes)
  • Collision is not a big deal
  • Can be generated from .NET Framework 2 (shouldn't be a 3rd party library)

I'm using hash for less memory footprint and comparison performance

回答1:

If collision is not a big deal you can take the first letter of each document. Or you can use the length of the text or the string with the text.



回答2:

Paul Hsieh has a decent, simple, fast, 32-bit SuperFastHash that performs better than most existing hash functions, is easier to understand/implement, and sounds like it meets your criteria.



回答3:

The FNV hash is a well-known fast hashing algorithm. It is not cryptographically secure, but it sounds like you don't need a secure hash.



回答4:

A very quick check would be to take the length of a text and XOR it with the first 4 bytes of it and use that as a hash. If this is good enough it is extremely fast because independent of the number of bytes of the file.



回答5:

Check out the serie Peter Karkowski published on his blog.



回答6:

If you are constrained to algorithms that exist in the framework

Is MD5 small enough (16 bytes)?

Less CPU Consumption and Small footprint are usually mutually exclusive.

http://en.wikipedia.org/wiki/Time-space_tradeoff



回答7:

How long does the hash need to hold for? GetHashCode() is pretty accessible, gives a small response (4 bytes), which should be fine (re minimizing collisions) over 20 strings.

However, GetHashCode() should not be persisted to database - it is fine for in-memory comparisons, though. Just be aware that the algorithm may change between frameworks (and did between 1.1 and 2.0).

The other advantage of this is that it is trivial to use - just use a Dictionary<string,Something>, which will deal with all the hashing etc for you.



回答8:

I had the same request for myselve and i implemented xxHashSharp . Just make sure you take the appropriate library ( x32 vs x64). It's also available outside of c# here