This questions is basically a follow-on from my answer here. I really wanted to say what the Big-O of this algorithm would be, but I wasn't sure my claim was completely sound.
So given two arrays:
B = [ "Hello World!", "Hello Stack Overflow!", "Foo Bar!", "Food is nice...", "Hej" ]
A = [ "World", "Foo" ]
what is the Big O of:
List<string> results = new List<string>();
foreach (string test in B)
{
if (A.Any(a => test.Contains(a))
results.Add(test);
}
I believe it to be somewhere between O(n)
and O(n^2)
as it depends where in the result the Any()
matches...
.Any()
should beO(n)
as it searches the container until it finds the first matching instance. So that, in a foreach loop should beO(n^2)
.I assume you provide
A
andB
just for example of their contents, and you are aware that complexity has meaning only for sets of inputs (like average, worst and best cases), not for individual inputs.My point is, depending on the problem requirements and use case, you can make very different estimates for the code complexity.
Let
n
beA.Length
andm
beB.Length
. Complexity of the given code then can be calculated in a few different ways:Assume
string.Contains
isO(1)
. In practice, such strong assumption can be made, for example, if we know for sure that no string is longer than some fixed in advance length. Then code complexity is of courseO(n*m)
.Assume
string.Contains
isO(x*y)
, wherex
andy
are lengths of haystack and needle. Let the longest string inA
be of lengthk_A
and the longest string inB
be of lengthk_B
. Then code complexity would beO(n*m*k_A*k_B)
For the second case, there is another (more practical) approach:
Let
S_A
be the sum of lengths for all strings inA
, andS_B
be the sum of length for all strings inB
. Then code complexity would beO(S_A * S_B)
.That's it for the worst case. For the average case, though, and for the most practical cases,
string.Contains
isO(x + y)
. So average code complexity would beO(n*m*(k_A + k_B))
orO((S_B + k_A) * n + (S_A + k_B) * m)
.It's close, however as others mention it's going to be
O(n*m)
as the sizes of each of your collections differ (with the best case beingO(n)
and worst beingO(n*m)
) otherwise, if you were iterating the same size collection twice, you'd getO(n^2)
.Taking a Look Behind the Scenes at
Any()
You can take a look at the source for the
Enumerable.Any()
method to dig into this a bit further. You'll see aforeach
loop to iterate through until it finds a match for the predicate which reinforces your assumption of it beingO(n)
:As you can see
Any()
by itself is going to beO(n)
for it's length and nesting that inside of an existing loopO(m)
should give youO(n*m)
for the entire code. However, it's possible that it could be as low asO(m)
.it's essentially a nested for loop, so the big O should be O(n^2) for worst case scenario
Let length of
A
beN
and length ofB
beM
. We have two extremal cases:The worst one:
returns
false
on everya
, soA.Any
has to scan the entireA
and we haveThe best one:
returns
true
on the very 1st item ofA
and thusA.Any
returns immediately and we have onlyActual complexity is somewhere in between (both borders included):