I have a
List<string>
with 1500 strings. I am now using the following code to pull out only string that start with the string prefixText.
foreach(string a in <MYLIST>)
{
if(a.StartsWith(prefixText, true, null))
{
newlist.Add(a);
}
}
This is pretty fast, but I'm looking for google fast. Now my question is if I arrange the List in alphabetical order, then compare char by char can I make this faster? Or any other suggestions on making this faster?
The question to me is whether or not you'll need to do this one time or multiple times.
If you only find the StartsWithPrefix list one time, you can't get faster then leaving the original list as is and doing
myList.Where(s => s.StartsWith(prefix))
. This looks at every string one time so it'sO(n)
If you need to find the StartsWithPrefix list several times, or maybe you're going to want to add or remove strings to the original list and update the StartsWithPrefix list then you should sort the original list and use binary search. But this will be
sort time + search time = O(n log n) + 2 * O(log n)
If you did the binary search method, you would find the indexes of the first occurrence of your prefix and the last occurrence via search. Then do
mySortedList.Skip(n).Take(m-n)
where n is first index and m is last index.Edit:
Wait a minute, we're using the wrong tool for the job. Use a Trie! If you put all your strings into a Trie instead of the list, all you have to do is walk down the trie with your prefix and grab all the words underneath that node.
I assume that the really fastest way would be to generate a dictionary with all possible prefixes from your 1500 strings, effectively precomputing the results for all possible searches that will return non-empty. Your search would then be simply a dictionary lookup completing in O(1) time. This is a case of trading memory (and initialization time) for speed.
You can use PLINQ (Parallel LINQ) to make the execution faster:
You can accelerate a bit by comparing the first character before invoking StartsWith:
I would go with using Linq:
1500 is usually too few:
you could search it in parallel with a simple divide and conquer of the problem. Search each half of the list in two (or divide into three, four, ..., parts) different jobs/threads.
Or store the strings in a (not binary) tree instead. Will be O(log n).
sorted in alphabetical order you can do a binary search (sort of the same as the previous one)