Is it faster to search for a large string in a DB

2019-01-25 09:16发布

If I need to retrieve a large string from a DB, Is it faster to search for it using the string itself or would I gain by hashing the string and storing the hash in the DB as well and then search based on that?

If yes what hash algorithm should I use (security is not an issue, I am looking for performance)

If it matters: I am using C# and MSSQL2005

10条回答
Viruses.
2楼-- · 2019-01-25 10:19

Though I've never done it, it sounds like this would work in principle. There's a chance you may get false positives but that's probably quite slim.

I'd go with a fast algorithm such as MD5 as you don't want to spend longer hashing the string than it would have taken you to just search for it.

The final thing I can say is that you'll only know if it is better if you try it out and measure the performance.

查看更多
地球回转人心会变
3楼-- · 2019-01-25 10:19

TIP: if you are going to store the hash in the database, a MD5 Hash is always 16 bytes, so can be saved in a uniqueidentifier column (and System.Guid in .NET)

This might offer some performance gain over saving hashes in a different way (I use this method to check for binary/ntext field changes but not for strings/nvarchars).

查看更多
我欲成王,谁敢阻挡
4楼-- · 2019-01-25 10:19

The 'ideal' answer is definitely yes. String matching against an indexed column will always be slower than matching a hashvalue stored in an index column. This is what hashvalues are designed for, because they take a large dataset (e.g. 3000 comparison points, one per character) and coalesce it into a smaller dataset, (e.g. 16 comparison points, one per byte).

So, the most optimized string comparison tool will be slower than the optimized hash value comparison.

However, as has been noted, implementing your own optimized hash function is dangerous and likely to not go well. (I've tried and failed miserably) Hash collisions are not particulrly a problem, because then you will just have to fall back on the string matching algorithm, which means that would be (at worst) exactly as fast as your string comparison method.

But, this is all assuming that your hashing is done in an optimal fashion, (which it probably won't be) and that there will not be any bugs in your hashing component (which there will be) and that the performance increase will be worth the effort (probably not). String comparison algorithms, especially in indexed columns are already pretty fast, and the hashing effort (programmer time) is likely to be much higher than your possible gain.

And if you want to know about performance, Just Measure It.

查看更多
姐就是有狂的资本
5楼-- · 2019-01-25 10:19

I am confused and am probably misunderstanding your question.

If you already have the string (thus you can compute the hash), why do you need to retrieve it?

Do you use a large string as the key for something perhaps?

查看更多
登录 后发表回答