How many times update maximum gets called when try

2019-05-23 00:11发布

问题:

Suppose we have an array with integers -N to N in an array size of 2N + 1. We first shuffle the elements in the array, then try to find the maximum integer by iterating through the array from the first element to the last element: (code example is in Java)

int called = 0;
int max = Integer.MIN_VALUE;
for (int i : array) {
    if (i > max) {
        called++;
        max = i;
    }
}

What is the expectation(average over many runs) of value called after iterating through the array?

Edit:

How I found it to be close to ln(array.length):

public static void main(String args[]) {
    List<Integer> list = new ArrayList<>();
    for (int i = 0; i < 1000000; i++) list.add(i);

    int called = 0;
    int runs = 100;

    for (int i = 1; i <= runs; i++) {
        Collections.shuffle(list);
        int max = -1;
        for (int num : list) {
            if (num > max) {
                called++;
                max = num;
            }
        }
    }
    System.out.println("Expectation: " + called/runs);
}

回答1:

Let's consider randomly shuffled array. We want to estimate K - number of times than i-th element is bigger than all of its predecessors.

Expected value of K equals to sum of probabilities that i-th element is bigger than all its predecessors. E(K) = Σ12N+1 Pi.

Now we want to find Pi. Consider prefix of length i for our array. Probability that the last element in the prefix is the biggest is 1/i. So we have Pi = 1/i.

Consequently:E(K) = Σ12N+11/i. We can estimate this sum via definite integral as ln(2N+1)+O(1).

So, the answer is ln(2N+1)+O(1).

And Monte-Carlo simulation for kinda proof:

constexpr size_t N = 1000;
std::array<int, 2 * N + 1> arr;
std::iota(arr.begin(), arr.end(), -N);
std::random_device rd;
std::mt19937 g(rd());
double moving = 0;
for (double trial = 1; trial < 10001; ++trial) {
    std::shuffle(arr.begin(), arr.end(), g);
    int called = 0;
    int max = std::numeric_limits<int>::min();
    for (int i = 1; i < arr.size(); ++i) {
        if (arr[i] > max) {
            ++called;
            max = arr[i];
        }
    }
    if (trial > 1) {
        moving = moving * ((trial - 1) / trial) + called / trial;
    }
    else {
        moving = called;
    }
}
cout << "actual:   " << moving << endl;
cout << "expected: " << std::log(2 * N + 1) << " + O(1)" << endl;
actual:   8.1581   
expected: 7.6014 + O(1)


回答2:

Originally I came across this problem during a software engineer interview. Here is what I found that could be used to solve this problem:

Cycle Notation

Stirling numbers of the first kind

As Yola and rici have pointed out, the answer should be ln(n) + γ (where γ is the Euler-Mascheroni constant and n is the number of elements in the array).