This questions is basically a follow-on from my answer here. I really wanted to say what the Big-O of this algorithm would be, but I wasn't sure my claim was completely sound.
So given two arrays:
B = [ "Hello World!", "Hello Stack Overflow!", "Foo Bar!", "Food is nice...", "Hej" ]
A = [ "World", "Foo" ]
what is the Big O of:
List<string> results = new List<string>();
foreach (string test in B)
{
if (A.Any(a => test.Contains(a))
results.Add(test);
}
I believe it to be somewhere between O(n)
and O(n^2)
as it depends where in the result the Any()
matches...
Let length of A
be N
and length of B
be M
. We have two extremal cases:
The worst one:
a => test.Contains(a)
returns false
on every a
, so A.Any
has to scan the entire A
and we have
O(N * M)
The best one:
a => test.Contains(a)
returns true
on the very 1st item of A
and thus A.Any
returns immediately and we have only
O(M)
Actual complexity is somewhere in between (both borders included):
[O(M)..O(M*N)]
It's close, however as others mention it's going to be O(n*m)
as the sizes of each of your collections differ (with the best case being O(n)
and worst being O(n*m)
) otherwise, if you were iterating the same size collection twice, you'd get O(n^2)
.
Taking a Look Behind the Scenes at Any()
You can take a look at the source for the Enumerable.Any()
method to dig into this a bit further. You'll see a foreach
loop to iterate through until it finds a match for the predicate which reinforces your assumption of it being O(n)
:
public static bool Any<TSource>(this IEnumerable<TSource> source, Func<TSource, bool> predicate) {
if (source == null) throw Error.ArgumentNull("source");
if (predicate == null) throw Error.ArgumentNull("predicate");
// Another loop to iterate through the collection until the predicate is matched
foreach (TSource element in source) {
if (predicate(element)) return true;
}
return false;
}
As you can see Any()
by itself is going to be O(n)
for it's length and nesting that inside of an existing loop O(m)
should give you O(n*m)
for the entire code. However, it's possible that it could be as low as O(m)
.
I assume you provide A
and B
just for example of their contents, and you are aware that complexity has meaning only for sets of inputs (like average, worst and best cases), not for individual inputs.
My point is, depending on the problem requirements and use case, you can make very different estimates for the code complexity.
Let n
be A.Length
and m
be B.Length
. Complexity of the given code then can be calculated in a few different ways:
Assume string.Contains
is O(1)
. In practice, such strong assumption can be made, for example, if we know for sure that no string is longer than some fixed in advance length. Then code complexity is of course O(n*m)
.
Assume string.Contains
is O(x*y)
, where x
and y
are lengths of haystack and needle. Let the longest string in A
be of length k_A
and the longest string in B
be of length k_B
. Then code complexity would be O(n*m*k_A*k_B)
For the second case, there is another (more practical) approach:
Let S_A
be the sum of lengths for all strings in A
, and S_B
be the sum of length for all strings in B
. Then code complexity would be O(S_A * S_B)
.
That's it for the worst case. For the average case, though, and for the most practical cases, string.Contains
is O(x + y)
. So average code complexity would be O(n*m*(k_A + k_B))
or O((S_B + k_A) * n + (S_A + k_B) * m)
.
.Any()
should be O(n)
as it searches the container until it finds the first matching instance. So that, in a foreach loop should be O(n^2)
.
it's essentially a nested for loop, so the big O should be O(n^2) for worst case scenario