Is it faster to search for a large string in a DB

2019-01-25 09:16发布

If I need to retrieve a large string from a DB, Is it faster to search for it using the string itself or would I gain by hashing the string and storing the hash in the DB as well and then search based on that?

If yes what hash algorithm should I use (security is not an issue, I am looking for performance)

If it matters: I am using C# and MSSQL2005

10条回答
欢心
2楼-- · 2019-01-25 09:54

If you use a fixed length field and an index it will probably be faster...

查看更多
走好不送
3楼-- · 2019-01-25 09:55

I'd be surprised if this offered huge improvement and I would recommend not using your own performance optimisations for a DB search.

If you use a database index there is scope for performance to be tuned by a DBA using tried and trusted methods. Hard coding your own index optimisation will prevent this and may stop you gaining for any performance improvements in indexing in future versions of the DB.

查看更多
狗以群分
4楼-- · 2019-01-25 09:57

If your strings are short (less than 100 charaters in general), strings will be faster.

If the strings are large, HASH search may and most probably will be faster.

HashBytes(MD4) seems to be the fastest on DML.

查看更多
三岁会撩人
5楼-- · 2019-01-25 10:04

In general: probably not, assuming the column is indexed. Database servers are designed to do such lookups quickly and efficiently. Some databases (e.g. Oracle) provide options to build indexes based on hashing.

However, in the end this can be only answered by performance testing with representative (of your requirements) data and usage patterns.

查看更多
淡お忘
6楼-- · 2019-01-25 10:11

First - MEASURE it. That is the only way to tell for sure.
Second - If you don't have an issue with the speed of the string searching, then keep it simple and don't use a Hash.

However, for your actual question (and just because it is an interesting thought). It depends on how similar the strings are. Remember that the DB engine doesn't need to compare all the characters in a string, only enough to find a difference. If you are looking through 10 million strings that all start with the same 300 characters then the hash will almost certainly be faster. If however you are looking for the only string that starts with an x, then i the string comparison could be faster. I think though that SQL will still have to get the entire string from disc, even if it then only uses the first byte (or first few bytes for multi byte characters), so the total string length will still have an impact.

If you are trying the hash comparison then you should make the hash an indexed calculated column. It will not be faster if you are working out the hashes for all the strings each time you run a query!

You could also consider using SQL's CRC function. It produces an int, which will be even quicker to comapre and is faster to calculate. But you will have to double check the results of this query by actually testing the string values because the CRC function is not designed for this sort of usage and is much more likly to return duplicate values. You will need to do the CRC or Hash check in one query, then have an outer query that compares the strings. You will also want to watch the QEP generated to make sure the optimiser is processing the query in the order you intended. It might decide to do the string comparisons first, then the CRC or Hash checks second.

As someone else has pointed out, this is only any good if you are doing an exact match. A hash can't help if you are trying to do any sort of range or partial match.

查看更多
虎瘦雄心在
7楼-- · 2019-01-25 10:17

Are you doing an equality match, or a containment match? For an equality match, you should let the db handle this (but add a non-clustered index) and just test via WHERE table.Foo = @foo. For a containment match, you should perhaps look at full text index.

查看更多
登录 后发表回答