Is HashSet the fastest container to look up in?

2019-03-12 19:19发布

I need to check that specific string contains in the set of others:

private bool Contains(string field)
{
   return this.Fields.Contains(field); // HashSet<string> local property
}

What is the best type of container to use if only one task of it - to hold a number of strings and check does another one is into or does not?

2条回答
Deceive 欺骗
2楼-- · 2019-03-12 19:54

Does HashSet work? Sure. But that's not the question you asked. You asked for the fastest possible lookup.

Is it the fastest possible? No, of course not, not by any measure.

First off, in order to talk about "fastest" we need to describe precisely what "fastest" means. Do you mean:

  • smallest worst possible case timing
  • smallest average timing averaged over many timings
  • smallest average timing given a particular usage pattern
  • something else

? Please clarify precisely what "fastest possible" means. We can devise you an algorithm which is the in theory fastest possible only if we know precisely what fastest possible means to you.

For example, suppose you are writing a compiler. Something we have to do all the time in compilers is check whether a particular string is in a list of strings. Perhaps we are checking to see if a string is a keyword, so we have to look up whether a given string is inside the set {"int", "double", "for", "foreach", "class" ... }

We could put those in a hash set and get decent performance. But if we wanted the best possible performance we could do a lot better. We could, for example, do an analysis of a few billion lines of existing source code to find out which keywords were most common and which were least common, and then write a custom hash table that optimized for (1) rapidly rejecting things that were not keywords at all, and (2) rapidly recognizing the most common keywords at the expense of recognizing other keywords.

Note that this requires static analysis; though it performs well on typical cases, it performs poorly on those rare cases where there are lots of rare keywords used. Another approach we could take would be to write a self tuning hash table that dynamically identified when particular strings were being searched for frequently.

Consider, for example, if you are writing an implementation of the JScript runtime. We frequently must look for a string in a set of strings:

for(i = 0; i < 10; ++i) { foo.bar(i); }

Here we must look up the string "bar" inside the object identified by "foo" ten times. The hash table inside "foo" that implements that lookup notices the first time through the loop that "bar" has been used, so it dynamically tweaks the hash table structure so that the second time through the loop, the lookup is faster. This is the strategy we employed in our implementation of JScript.

Now, that optimizes the case for loops, but it makes this case potentially slower than it could be:

for(i = 0; i < 10; ++i) { foo.bar(i); foo.blah(i); foo.abc(i); }

because we don't do more analysis and realize "hey, we just re-optimized this hash table three times, and now we're going to do it all again, maybe we should just leave it as it is."

Fortunately for us, we were not, like you, looking for the fastest possible lookup. We were only looking for a reasonably fast lookup.

Can you carefully and completely describe what exactly your usage case is for the fastest possible lookup? There are lots of algorithms you can use to speed up lookups, but they get very complicated.

查看更多
Explosion°爆炸
3楼-- · 2019-03-12 19:58

Yes, HashSet is perfect for this since it contains one value to look up unlike a Dictionary which requires a key and a value.

查看更多
登录 后发表回答