Most people with a degree in CS will certainly know what Big O stands for. It helps us to measure how (in)efficient an algorithm really is and if you know in what category the problem you are trying to solve lays in you can figure out if it is still possible to squeeze out that little extra performance.1
But I'm curious, how do you calculate or approximate the complexity of your algorithms?
1 but as they say, don't overdo it, premature optimization is the root of all evil, and optimization without a justified cause should deserve that name as well.
Big O notation is useful because it's easy to work with and hides unnecessary complications and details (for some definition of unnecessary). One nice way of working out the complexity of divide and conquer algorithms is the tree method. Let's say you have a version of quicksort with the median procedure, so you split the array into perfectly balanced subarrays every time.
Now build a tree corresponding to all the arrays you work with. At the root you have the original array, the root has two children which are the subarrays. Repeat this until you have single element arrays at the bottom.
Since we can find the median in O(n) time and split the array in two parts in O(n) time, the work done at each node is O(k) where k is the size of the array. Each level of the tree contains (at most) the entire array so the work per level is O(n) (the sizes of the subarrays add up to n, and since we have O(k) per level we can add this up). There are only log(n) levels in the tree since each time we halve the input.
Therefore we can upper bound the amount of work by O(n*log(n)).
However, Big O hides some details which we sometimes can't ignore. Consider computing the Fibonacci sequence with
and lets just assume the a and b are BigIntegers in Java or something that can handle arbitrarily large numbers. Most people would say this is an O(n) algorithm without flinching. The reasoning is that you have n iterations in the for loop and O(1) work in side the loop.
But Fibonacci numbers are large, the n-th Fibonacci number is exponential in n so just storing it will take on the order of n bytes. Performing addition with big integers will take O(n) amount of work. So the total amount of work done in this procedure is
1 + 2 + 3 + ... + n = n(n-1)/2 = O(n^2)
So this algorithm runs in quadradic time!
In addition to using the master method (or one of its specializations), I test my algorithms experimentally. This can't prove that any particular complexity class is achieved, but it can provide reassurance that the mathematical analysis is appropriate. To help with this reassurance, I use code coverage tools in conjunction with my experiments, to ensure that I'm exercising all the cases.
As a very simple example say you wanted to do a sanity check on the speed of the .NET framework's list sort. You could write something like the following, then analyze the results in Excel to make sure they did not exceed an n*log(n) curve.
In this example I measure the number of comparisons, but it's also prudent to examine the actual time required for each sample size. However then you must be even more careful that you are just measuring the algorithm and not including artifacts from your test infrastructure.
I don't know how to programmatically solve this, but the first thing people do is that we sample the algorithm for certain patterns in the number of operations done, say 4n^2 + 2n + 1 we have 2 rules:
If we simplify f(x), where f(x) is the formula for number of operations done, (4n^2 + 2n + 1 explained above), we obtain the big-O value [O(n^2) in this case]. But this would have to account for Lagrange interpolation in the program, which may be hard to implement. And what if the real big-O value was O(2^n), and we might have something like O(x^n), so this algorithm probably wouldn't be programmable. But if someone proves me wrong, give me the code . . . .
As to "how do you calculate" Big O, this is part of Computational complexity theory. For some (many) special cases you may be able to come with some simple heuristics (like multiplying loop counts for nested loops), esp. when all you want is any upper bound estimation, and you do not mind if it is too pessimistic - which I guess is probably what your question is about.
If you really want to answer your question for any algorithm the best you can do is to apply the theory. Besides of simplistic "worst case" analysis I have found Amortized analysis very useful in practice.
Basically the thing that crops up 90% of the time is just analyzing loops. Do you have single, double, triple nested loops? The you have O(n), O(n^2), O(n^3) running time.
Very rarely (unless you are writing a platform with an extensive base library (like for instance, the .NET BCL, or C++'s STL) you will encounter anything that is more difficult than just looking at your loops (for statements, while, goto, etc...)
I'm a professor assistant at my local university on the Data Structures and Algorithms course. I'll do my best to explain it here on simple terms, but be warned that this topic takes my students a couple of months to finally grasp. You can find more information on the Chapter 2 of the Data Structures and Algorithms in Java book.
There is no mechanical procedure that can be used to get the BigOh.
As a "cookbook", to obtain the BigOh from a piece of code you first need to realize that you are creating a math formula to count how many steps of computations get executed given an input of some size.
The purpose is simple: to compare algorithms from a theoretical point of view, without the need to execute the code. The lesser the number of steps, the faster the algorithm.
For example, let's say you have this piece of code:
This function returns the sum of all the elements of the array, and we want to create a formula to count the computational complexity of that function:
So we have
f(N)
, a function to count the number of computational steps. The input of the function is the size of the structure to process. It means that this function is called such as:The parameter
N
takes thedata.length
value. Now we need the actual definition of the functionf()
. This is done from the source code, in which each interesting line is numbered from 1 to 4.There are many ways to calculate the BigOh. From this point forward we are going to assume that every sentence that doesn't depend on the size of the input data takes a constant
C
number computational steps.We are going to add the individual number of steps of the function, and neither the local variable declaration nor the return statement depends on the size of the
data
array.That means that lines 1 and 4 takes C amount of steps each, and the function is somewhat like this:
The next part is to define the value of the
for
statement. Remember that we are counting the number of computational steps, meaning that the body of thefor
statement gets executedN
times. That's the same as addingC
,N
times:There is no mechanical rule to count how many times the body of the
for
gets executed, you need to count it by looking at what does the code do. To simplify the calculations, we are ignoring the variable initialization, condition and increment parts of thefor
statement.To get the actual BigOh we need the Asymptotic analysis of the function. This is roughly done like this:
C
.f()
get the polynomium in itsstandard form
.N
approachesinfinity
.Our
f()
has two terms:Taking away all the
C
constants and redundant parts:Since the last term is the one which grows bigger when
f()
approaches infinity (think on limits) this is the BigOh argument, and thesum()
function has a BigOh of:There are a few tricks to solve some tricky ones: use summations whenever you can.
As an example, this code can be easily solved using summations:
The first thing you needed to be asked is the order of execution of
foo()
. While the usual is to beO(1)
, you need to ask your professors about it.O(1)
means (almost, mostly) constantC
, independent of the sizeN
.The
for
statement on the sentence number one is tricky. While the index ends at2 * N
, the increment is done by two. That means that the firstfor
gets executed onlyN
steps, and we need to divide the count by two.The sentence number two is even trickier since it depends on the value of
i
. Take a look: the index i takes the values: 0, 2, 4, 6, 8, ..., 2 * N, and the secondfor
get executed: N times the first one, N - 2 the second, N - 4 the third... up to the N / 2 stage, on which the secondfor
never gets executed.On formula, that means:
Again, we are counting the number of steps. And by definition, every summation should always start at one, and end at a number bigger-or-equal than one.
(We are assuming that
foo()
isO(1)
and takesC
steps.)We have a problem here: when
i
takes the valueN / 2 + 1
upwards, the inner Summation ends at a negative number! That's impossible and wrong. We need to split the summation in two, being the pivotal point the momenti
takesN / 2 + 1
.Since the pivotal moment
i > N / 2
, the innerfor
won't get executed, and we are assuming a constant C execution complexity on its body.Now the summations can be simplified using some identity rules:
w
)Applying some algebra:
And the BigOh is: