Suppose we have an array with integers -N to N in an array size of 2N + 1. We first shuffle the elements in the array, then try to find the maximum integer by iterating through the array from the first element to the last element: (code example is in Java)
int called = 0;
int max = Integer.MIN_VALUE;
for (int i : array) {
if (i > max) {
called++;
max = i;
}
}
What is the expectation(average over many runs) of value called
after iterating through the array?
Edit:
How I found it to be close to ln(array.length):
public static void main(String args[]) {
List<Integer> list = new ArrayList<>();
for (int i = 0; i < 1000000; i++) list.add(i);
int called = 0;
int runs = 100;
for (int i = 1; i <= runs; i++) {
Collections.shuffle(list);
int max = -1;
for (int num : list) {
if (num > max) {
called++;
max = num;
}
}
}
System.out.println("Expectation: " + called/runs);
}
Let's consider randomly shuffled array. We want to estimate K - number of times than i-th element is bigger than all of its predecessors.
Expected value of K equals to sum of probabilities that i-th element is bigger than all its predecessors. E(K) = Σ12N+1 Pi.
Now we want to find Pi. Consider prefix of length i for our array. Probability that the last element in the prefix is the biggest is 1/i. So we have Pi = 1/i.
Consequently:E(K) = Σ12N+11/i. We can estimate this sum via definite integral as ln(2N+1)+O(1).
So, the answer is ln(2N+1)+O(1).
And Monte-Carlo simulation for kinda proof:
constexpr size_t N = 1000;
std::array<int, 2 * N + 1> arr;
std::iota(arr.begin(), arr.end(), -N);
std::random_device rd;
std::mt19937 g(rd());
double moving = 0;
for (double trial = 1; trial < 10001; ++trial) {
std::shuffle(arr.begin(), arr.end(), g);
int called = 0;
int max = std::numeric_limits<int>::min();
for (int i = 1; i < arr.size(); ++i) {
if (arr[i] > max) {
++called;
max = arr[i];
}
}
if (trial > 1) {
moving = moving * ((trial - 1) / trial) + called / trial;
}
else {
moving = called;
}
}
cout << "actual: " << moving << endl;
cout << "expected: " << std::log(2 * N + 1) << " + O(1)" << endl;
actual: 8.1581
expected: 7.6014 + O(1)
Originally I came across this problem during a software engineer interview. Here is what I found that could be used to solve this problem:
Cycle Notation
Stirling numbers of the first kind
As Yola and rici have pointed out, the answer should be ln(n) + γ (where γ is the Euler-Mascheroni constant and n is the number of elements in the array).