I recently attended an interview where I was asked "write a program to find 100 largest numbers out of an array of 1 billion numbers."
I was only able to give a brute force solution which was to sort the array in O(nlogn) time complexity and take the last 100 numbers.
Arrays.sort(array);
The interviewer was looking for a better time complexity, I tried a couple of other solutions but failed to answer him. Is there a better time complexity solution?
This question would be answered with N log(100) complexity (instead of N log N) with just one line of C++ code.
The final answer would be a vector where the first 100 elements are guaranteed to be the 100 biggest numbers of you array while the remaining elements are unordered
C++ STL (standard library) is quite handy for this kind of problems.
Note: I am not saying that this is the optimal solution, but it would have saved your interview.
I would find out who had the time to put a billion numbers into an array and fire him. Must work for government. At least if you had a linked list you could insert a number into the middle without moving half a billion to make room. Even better a Btree allows for a binary search. Each comparison eliminates half of your total. A hash algorithm would allow you to populate the data structure like a checkerboard but not so good for sparse data. As it is your best bet is to have a solution array of 100 integers and keep track of the lowest number in your solution array so you can replace it when you come across a higher number in the original array. You would have to look at every element in the original array assuming it is not sorted to begin with.
Another O(n) algorithm -
The algorithm finds the largest 100 by elimination
consider all the million numbers in their binary representation. Start from the most significant bit. Finding if the MSB is 1 can be a done by a boolean operation multiplication with an appropriate number. If there are more than 100 1's in these million eliminate the other numbers with zeros. Now of the remaining numbers proceed with the next most significant bit. keep a count of the number of remaining numbers after elimination and proceed as long as this number is greater than 100.
The major boolean operation can be an parallely done on GPUs
An very easy solution would be to iterate through the array 100 times. Which is
O(n)
.Each time you pull out the largest number (and change its value to the minimum value, so that you don't see it in the next iteration, or keep track of indexes of previous answers (by keeping track of indexes the original array can have multiple of the same number)). After 100 iterations, you have the 100 largest numbers.
Please note esp. the second step might be easy to compute in parallel! And it will also be efficiently when you need a million biggest elements.
You can use Quick select algorithm to find the number at the(by order) index [billion-101] and then iterate over the numbers and to find the numbers that biger from that number.
This algorithm Time is: 2 X O(N) = O(N) (Average case performance)
The second option like Thomas Jungblut suggest is:
Use Heap building the MAX heap will take O(N),then the top 100 max numbers will be in the top of the Heap, all you need is to get them out from the heap(100 X O(Log(N)).
This algorithm Time is:O(N) + 100 X O(Log(N)) = O(N)