Big O, how do you calculate/approximate it?

2018-12-30 23:52发布

Most people with a degree in CS will certainly know what Big O stands for. It helps us to measure how (in)efficient an algorithm really is and if you know in what category the problem you are trying to solve lays in you can figure out if it is still possible to squeeze out that little extra performance.1

But I'm curious, how do you calculate or approximate the complexity of your algorithms?

1 but as they say, don't overdo it, premature optimization is the root of all evil, and optimization without a justified cause should deserve that name as well.

22条回答
初与友歌
2楼-- · 2018-12-31 00:23

For the 1st case, the inner loop is executed n-i times, so the total number of executions is the sum for i going from 0 to n-1 (because lower than, not lower than or equal) of the n-i. You get finally n*(n + 1) / 2, so O(n²/2) = O(n²).

For the 2nd loop, i is between 0 and n included for the outer loop; then the inner loop is executed when j is strictly greater than n, which is then impossible.

查看更多
人气声优
3楼-- · 2018-12-31 00:25

Lets start from the beginning.

First of all, accept the principle that certain simple operations on data can be done in O(1) time, that is, in time that is independent of the size of the input. These primitive operations in C consist of

  1. Arithmetic operations (e.g. + or %).
  2. Logical operations (e.g., &&).
  3. Comparison operations (e.g., <=).
  4. Structure accessing operations (e.g. array-indexing like A[i], or pointer fol- lowing with the -> operator).
  5. Simple assignment such as copying a value into a variable.
  6. Calls to library functions (e.g., scanf, printf).

The justification for this principle requires a detailed study of the machine instructions (primitive steps) of a typical computer. Each of the described operations can be done with some small number of machine instructions; often only one or two instructions are needed. As a consequence, several kinds of statements in C can be executed in O(1) time, that is, in some constant amount of time independent of input. These simple include

  1. Assignment statements that do not involve function calls in their expressions.
  2. Read statements.
  3. Write statements that do not require function calls to evaluate arguments.
  4. The jump statements break, continue, goto, and return expression, where expression does not contain a function call.

In C, many for-loops are formed by initializing an index variable to some value and incrementing that variable by 1 each time around the loop. The for-loop ends when the index reaches some limit. For instance, the for-loop

for (i = 0; i < n-1; i++) 
{
    small = i;
    for (j = i+1; j < n; j++)
        if (A[j] < A[small])
            small = j;
    temp = A[small];
    A[small] = A[i];
    A[i] = temp;
}

uses index variable i. It increments i by 1 each time around the loop, and the iterations stop when i reaches n − 1.

However, for the moment, focus on the simple form of for-loop, where the difference between the final and initial values, divided by the amount by which the index variable is incremented tells us how many times we go around the loop. That count is exact, unless there are ways to exit the loop via a jump statement; it is an upper bound on the number of iterations in any case.

For instance, the for-loop iterates ((n − 1) − 0)/1 = n − 1 times, since 0 is the initial value of i, n − 1 is the highest value reached by i (i.e., when i reaches n−1, the loop stops and no iteration occurs with i = n−1), and 1 is added to i at each iteration of the loop.

In the simplest case, where the time spent in the loop body is the same for each iteration, we can multiply the big-oh upper bound for the body by the number of times around the loop. Strictly speaking, we must then add O(1) time to initialize the loop index and O(1) time for the first comparison of the loop index with the limit, because we test one more time than we go around the loop. However, unless it is possible to execute the loop zero times, the time to initialize the loop and test the limit once is a low-order term that can be dropped by the summation rule.


Now consider this example:

(1) for (j = 0; j < n; j++)
(2)   A[i][j] = 0;

We know that line (1) takes O(1) time. Clearly, we go around the loop n times, as we can determine by subtracting the lower limit from the upper limit found on line (1) and then adding 1. Since the body, line (2), takes O(1) time, we can neglect the time to increment j and the time to compare j with n, both of which are also O(1). Thus, the running time of lines (1) and (2) is the product of n and O(1), which is O(n).

Similarly, we can bound the running time of the outer loop consisting of lines (2) through (4), which is

(2) for (i = 0; i < n; i++)
(3)     for (j = 0; j < n; j++)
(4)         A[i][j] = 0;

We have already established that the loop of lines (3) and (4) takes O(n) time. Thus, we can neglect the O(1) time to increment i and to test whether i < n in each iteration, concluding that each iteration of the outer loop takes O(n) time.

The initialization i = 0 of the outer loop and the (n + 1)st test of the condition i < n likewise take O(1) time and can be neglected. Finally, we observe that we go around the outer loop n times, taking O(n) time for each iteration, giving a total O(n^2) running time.


A more practical example.

enter image description here

查看更多
浮光初槿花落
4楼-- · 2018-12-31 00:27

If you want to estimate the order of your code empirically rather than by analyzing the code, you could stick in a series of increasing values of n and time your code. Plot your timings on a log scale. If the code is O(x^n), the values should fall on a line of slope n.

This has several advantages over just studying the code. For one thing, you can see whether you're in the range where the run time approaches its asymptotic order. Also, you may find that some code that you thought was order O(x) is really order O(x^2), for example, because of time spent in library calls.

查看更多
步步皆殇っ
5楼-- · 2018-12-31 00:28

Less useful generally, I think, but for the sake of completeness there is also a Big Omega Ω, which defines a lower-bound on an algorithm's complexity, and a Big Theta Θ, which defines both an upper and lower bound.

查看更多
登录 后发表回答