What would the Big O be of a nested for loop with

2019-06-15 07:29发布

This questions is basically a follow-on from my answer here. I really wanted to say what the Big-O of this algorithm would be, but I wasn't sure my claim was completely sound.

So given two arrays:

B = [ "Hello World!", "Hello Stack Overflow!", "Foo Bar!", "Food is nice...", "Hej" ]
A = [ "World", "Foo" ]

what is the Big O of:

List<string> results = new List<string>();
foreach (string test in B)
{
   if (A.Any(a => test.Contains(a))
      results.Add(test);
}

I believe it to be somewhere between O(n) and O(n^2) as it depends where in the result the Any() matches...

5条回答
相关推荐>>
2楼-- · 2019-06-15 07:40

.Any() should be O(n) as it searches the container until it finds the first matching instance. So that, in a foreach loop should be O(n^2).

查看更多
【Aperson】
3楼-- · 2019-06-15 07:45

I assume you provide A and B just for example of their contents, and you are aware that complexity has meaning only for sets of inputs (like average, worst and best cases), not for individual inputs.

My point is, depending on the problem requirements and use case, you can make very different estimates for the code complexity.

Let n be A.Length and m be B.Length. Complexity of the given code then can be calculated in a few different ways:

  1. Assume string.Contains is O(1). In practice, such strong assumption can be made, for example, if we know for sure that no string is longer than some fixed in advance length. Then code complexity is of course O(n*m).

  2. Assume string.Contains is O(x*y), where x and y are lengths of haystack and needle. Let the longest string in A be of length k_A and the longest string in B be of length k_B. Then code complexity would be O(n*m*k_A*k_B)

For the second case, there is another (more practical) approach:

Let S_A be the sum of lengths for all strings in A, and S_B be the sum of length for all strings in B. Then code complexity would be O(S_A * S_B).

That's it for the worst case. For the average case, though, and for the most practical cases, string.Contains is O(x + y). So average code complexity would be O(n*m*(k_A + k_B)) or O((S_B + k_A) * n + (S_A + k_B) * m).

查看更多
混吃等死
4楼-- · 2019-06-15 07:49

It's close, however as others mention it's going to be O(n*m) as the sizes of each of your collections differ (with the best case being O(n) and worst being O(n*m)) otherwise, if you were iterating the same size collection twice, you'd get O(n^2).

Taking a Look Behind the Scenes at Any()

You can take a look at the source for the Enumerable.Any() method to dig into this a bit further. You'll see a foreach loop to iterate through until it finds a match for the predicate which reinforces your assumption of it being O(n):

public static bool Any<TSource>(this IEnumerable<TSource> source, Func<TSource, bool> predicate) {
   if (source == null) throw Error.ArgumentNull("source");
   if (predicate == null) throw Error.ArgumentNull("predicate");
   // Another loop to iterate through the collection until the predicate is matched
   foreach (TSource element in source) {
        if (predicate(element)) return true;
   }
   return false;
}

As you can see Any() by itself is going to be O(n) for it's length and nesting that inside of an existing loop O(m) should give you O(n*m) for the entire code. However, it's possible that it could be as low as O(m).

查看更多
对你真心纯属浪费
5楼-- · 2019-06-15 07:58

it's essentially a nested for loop, so the big O should be O(n^2) for worst case scenario

查看更多
祖国的老花朵
6楼-- · 2019-06-15 08:00

Let length of A be N and length of B be M. We have two extremal cases:

  1. The worst one:

    a => test.Contains(a) 
    

    returns false on every a, so A.Any has to scan the entire A and we have

    O(N * M) 
    
  2. The best one:

    a => test.Contains(a) 
    

    returns true on the very 1st item of A and thus A.Any returns immediately and we have only

    O(M)
    

Actual complexity is somewhere in between (both borders included):

    [O(M)..O(M*N)]
查看更多
登录 后发表回答