I can use the median of medians selection algorithm to find the median in O(n). Also, I know that after the algorithm is done, all the elements to the left of the median are less that the median and all the elements to the right are greater than the median. But how do I find the k nearest neighbors to the median in O(n) time?
If the median is n, the numbers to the left are less than n and the numbers to the right are greater than n. However, the array is not sorted in the left or the right sides. The numbers are any set of distinct numbers given by the user.
The problem is from Introduction to Algorithms by Cormen, problem 9.3-7
You already know how to find the median in O(n)
if the order does not matter, selection of k smallest can be done in O(n) apply for k smallest to the rhs of the median and k largest to the lhs of the median
from wikipedia
don't forget the special case where k==n return the original list
Four Steps:
First select the median in
O(n)
time, using a standard algorithm of that complexity. Then run through the list again, selecting the elements that are nearest to the median (by storing the best known candidates and comparing new values against these candidates, just like one would search for a maximum element).In each step of this additional run through the list O(k) steps are needed, and since k is constant this is O(1). So the total for time needed for the additional run is O(n), as is the total runtime of the full algorithm.
You could use a non-comparison sort, such as a radix sort, on the list of numbers
L
, then find the k closest neighbors by considering windows of k elements and examining the window endpoints. Another way of stating "find the window" is find i that minimizesabs(L[(n-k)/2+i] - L[n/2]) + abs(L[(n+k)/2+i] - L[n/2])
(if k is odd) orabs(L[(n-k)/2+i] - L[n/2]) + abs(L[(n+k)/2+i+1] - L[n/2])
(if k is even). Combining the cases,abs(L[(n-k)/2+i] - L[n/2]) + abs(L[(n+k)/2+i+!(k&1)] - L[n/2])
. A simple, O(k) way of finding the minimum is to start with i=0, then slide to the left or right, but you should be able to find the minimum in O(log(k)).The expression you minimize comes from transforming
L
into another list,M
, by taking the difference of each element from the median.i
minimizesM[n/2-k/2+i] + M[n/2+k/2+i]
.Since all the elements are distinct, there can be atmost 2 elements with the same difference from the mean. I think it is easier for me to have 2 arrays A[k] and B[k] the index representing the absolute value of the difference from the mean. Now the task is to just fill up the arrays and choose k elements by reading the first k non empty values of the arrays reading A[i] and B[i] before A[i+1] and B[i+1]. This can be done in O(n) time.