search List for string .StartsWith()

2020-04-07 04:28发布

I have a

List<string>

with 1500 strings. I am now using the following code to pull out only string that start with the string prefixText.

foreach(string a in <MYLIST>)
{            
    if(a.StartsWith(prefixText, true, null))
    {
        newlist.Add(a);                   
    }            
}

This is pretty fast, but I'm looking for google fast. Now my question is if I arrange the List in alphabetical order, then compare char by char can I make this faster? Or any other suggestions on making this faster?

10条回答
混吃等死
2楼-- · 2020-04-07 05:11

The question to me is whether or not you'll need to do this one time or multiple times.

If you only find the StartsWithPrefix list one time, you can't get faster then leaving the original list as is and doing myList.Where(s => s.StartsWith(prefix)). This looks at every string one time so it's O(n)

If you need to find the StartsWithPrefix list several times, or maybe you're going to want to add or remove strings to the original list and update the StartsWithPrefix list then you should sort the original list and use binary search. But this will be sort time + search time = O(n log n) + 2 * O(log n)

If you did the binary search method, you would find the indexes of the first occurrence of your prefix and the last occurrence via search. Then do mySortedList.Skip(n).Take(m-n) where n is first index and m is last index.

Edit:

Wait a minute, we're using the wrong tool for the job. Use a Trie! If you put all your strings into a Trie instead of the list, all you have to do is walk down the trie with your prefix and grab all the words underneath that node.

查看更多
迷人小祖宗
3楼-- · 2020-04-07 05:14

I assume that the really fastest way would be to generate a dictionary with all possible prefixes from your 1500 strings, effectively precomputing the results for all possible searches that will return non-empty. Your search would then be simply a dictionary lookup completing in O(1) time. This is a case of trading memory (and initialization time) for speed.

private IDictionary<string, string[]> prefixedStrings;

public void Construct(IEnumerable<string> strings)
{
    this.prefixedStrings =
        (
            from s in strings
            from i in Enumerable.Range(1, s.Length)
            let p = s.Substring(0, i)
            group s by p
        ).ToDictionary(
            g => g.Key,
            g => g.ToArray());
}

public string[] Search(string prefix)
{
    string[] result;
    if (this.prefixedStrings.TryGetValue(prefix, out result))
        return result;

    return new string[0];
}
查看更多
你好瞎i
4楼-- · 2020-04-07 05:15

You can use PLINQ (Parallel LINQ) to make the execution faster:

var newList = list.AsParallel().Where(x => x.StartsWith(prefixText)).ToList()
查看更多
神经病院院长
5楼-- · 2020-04-07 05:15

You can accelerate a bit by comparing the first character before invoking StartsWith:

char first = prefixText[0];

foreach(string a in <MYLIST>) 
    {    
         if (a[0]==first)
         {        
            if(a.StartsWith(prefixText, true, null)) 
            { 
                newlist.Add(a);                    
            }
         }             
    } 
查看更多
▲ chillily
6楼-- · 2020-04-07 05:26

I would go with using Linq:

 var query = list.Where(w => w.StartsWith("prefixText")).Select(s => s).ToList();
查看更多
爷、活的狠高调
7楼-- · 2020-04-07 05:28

1500 is usually too few:

  • you could search it in parallel with a simple divide and conquer of the problem. Search each half of the list in two (or divide into three, four, ..., parts) different jobs/threads.

  • Or store the strings in a (not binary) tree instead. Will be O(log n).

  • sorted in alphabetical order you can do a binary search (sort of the same as the previous one)

查看更多
登录 后发表回答