Someone asked me a brainteaser, and I don't know; my knowledge slows down after amortized analysis, and in this case, this is O(n).
public int findMax(array) {
int count = 0;
int max = array[0];
for (int i=0; i<array.length; i++) {
if (array[i] > max) {
count++;
max = array[i];
}
}
return count;
}
What's the expected value of count
for an array of size n?
Numbers are randomly picked from a uniform distribution.
Let f(n) be the average number of assignments.
Then if the last element is not the largest, f(n) = f(n-1).
If the last element is the largest, then f(n) = f(n-1) + 1.
Since the last number is largest with probability
1/n
, and not the largest with probability(n-1)/n
, we have:Expand and collect terms to get:
And f(1) = 0. So:
That is, f(n) is the n_th "Harmonic number", which you can get in closed form only approximately. (Well, one less than the n_th Harmonic number. The problem would be prettier if you initialized
max
toINT_MIN
and just let the loop run, so that f(1) = 1.)The above is not a rigorous proof, since I was sloppy about expected values versus actual values. But I believe the answer is right anyway :-).
I am assuming all elements are distinct and counting the initial assignment to
max
outside the for loop.If the array is sorted in increasing order, the variable
max
gets assigned to exactly n times (each time it gets a greater value).If the array is sorted in decreasing order, the variable
max
gets assigned to exactly once (it gets assigned the first time and all subsequent values are smaller).Edit: My formulation for a randomly permuted array was actually wrong, as pointed out in the comments. I think @Nemo posts the correct answer to this.
I think just counting the number of assignments is not really a true measure of the cost of this function. whether or not we actually update the value of
max
, we are actually comparing it exactly n times. So, fewer assignments does not really imply less work done.Also observe that there are actually no swaps being done. Only assignments and comparisons.
You can actually take this analysis a step further when the value of each item comes from a finite set. Let E(N, M) be the expected number of assignments when finding the max of N elements that come uniformly from an alphabet of size M. Then we can say...
This is a bit hard to come up with a closed form for but we can be sure that E(N, M) is in O(log(min(N, M))). This is because E(N, INF) is in THETA(log(N)) as the harmonic series sum grows proportional to the log function and E(N, M) < E(N, M + 1). Likewise when M < N we have E(N, M) < E(M, INF) as there is at M unique values.
And here's some code to compute E(N, M) yourself. I wonder if anyone can get this to a closed form?
I would like to comment on Nemo's answer, but I don't have the reputation to comment. His correct answer can be simplified:
The chance that the second number is larger than the first is 1/2. Regardless of that, the chance that the 3rd number is larger than two before, is 1/3. These are all independent chances and the total expectation is therefore
1/2 + 1/3 + 1/4 + .. + 1/n