Write a program to find 100 largest numbers out of

2019-01-08 02:31发布

I recently attended an interview where I was asked "write a program to find 100 largest numbers out of an array of 1 billion numbers."

I was only able to give a brute force solution which was to sort the array in O(nlogn) time complexity and take the last 100 numbers.

Arrays.sort(array);

The interviewer was looking for a better time complexity, I tried a couple of other solutions but failed to answer him. Is there a better time complexity solution?

30条回答
姐就是有狂的资本
2楼-- · 2019-01-08 02:53

If this is asked in an interview, I think the interviewer probably wants to see your problem solving process, not just your knowledge of algorithms.

The description is quite general so maybe you can ask him the range or meaning of these numbers to make the problem clear. Doing this may impress an interviewer. If, for example, these numbers stands for people's age of within a country (e.g. China),then it's a much easier problem. With a reasonable assumption that nobody alive is older than 200, you can use an int array of size 200(maybe 201) to count the number of people with the same age in just one iteration. Here the index means the age. After this it's a piece of cake to find 100 largest number. By the way this algo is called counting sort.

Anyway, making the question more specific and clearer is good for you in an interview.

查看更多
你好瞎i
3楼-- · 2019-01-08 02:53

Inspired by @ron teller's answer, here is a barebones C program to do what you want.

#include <stdlib.h>
#include <stdio.h>

#define TOTAL_NUMBERS 1000000000
#define N_TOP_NUMBERS 100

int 
compare_function(const void *first, const void *second)
{
    int a = *((int *) first);
    int b = *((int *) second);
    if (a > b){
        return 1;
    }
    if (a < b){
        return -1;
    }
    return 0;
}

int 
main(int argc, char ** argv)
{
    if(argc != 2){
        printf("please supply a path to a binary file containing 1000000000"
               "integers of this machine's wordlength and endianness\n");
        exit(1);
    }
    FILE * f = fopen(argv[1], "r");
    if(!f){
        exit(1);
    }
    int top100[N_TOP_NUMBERS] = {0};
    int sorts = 0;
    for (int i = 0; i < TOTAL_NUMBERS; i++){
        int number;
        int ok;
        ok = fread(&number, sizeof(int), 1, f);
        if(!ok){
            printf("not enough numbers!\n");
            break;
        }
        if(number > top100[0]){
            sorts++;
            top100[0] = number;
            qsort(top100, N_TOP_NUMBERS, sizeof(int), compare_function);
        }

    }
    printf("%d sorts made\n"
    "the top 100 integers in %s are:\n",
    sorts, argv[1] );
    for (int i = 0; i < N_TOP_NUMBERS; i++){
        printf("%d\n", top100[i]);
    }
    fclose(f);
    exit(0);
}

On my machine (core i3 with a fast SSD) it takes 25 seconds, and 1724 sorts. I generated a binary file with dd if=/dev/urandom/ count=1000000000 bs=1 for this run.

Obviously, there are performance issues with reading only 4 bytes at a time - from disk, but this is for example's sake. On the plus side, very little memory is needed.

查看更多
走好不送
4楼-- · 2019-01-08 02:54

Possible improvements.

If the file contains 1 billions number, reading it could be really long...

To improve this working you can :

  • Split the file into n parts, Create n threads, make n threads look each for the 100 biggest numbers in their part of the file (using the priority queue), and finally get the 100 biggest numbers of all threads output.
  • Use a cluster to do a such task, with a solution like hadoop. Here you can split the file even more and have the output quicker for a 1 billion (or a 10^12) numbers file.
查看更多
做自己的国王
5楼-- · 2019-01-08 02:55

You can iterate over the numbers which takes O(n)

Whenever you find a value greater than the current minimum, add the new value to a circular queue with size 100.

The min of that circular queue is your new comparison value. Keep on adding to that queue. If full, extract the minimum from the queue.

查看更多
放我归山
6楼-- · 2019-01-08 02:55

Two options:

(1) Heap (priorityQueue)

Maintain a min-heap with size of 100. Traverse the array. Once the element is smaller than first element in heap, replace it.

InSERT ELEMENT INTO HEAP: O(log100)
compare the first element: O(1)
There are n elements in the array, so the total would be O(nlog100), which is O(n)

(2) Map-reduce model.

This is very similar to word count example in hadoop. Map job: count every element's frequency or times appeared. Reduce: Get top K element.

Usually, I would give the recruiter two answers. Give them whatever they like. Of course, map reduce coding would be labor-some because you have to know every exact parameters. No harm to practice it. Good Luck.

查看更多
放荡不羁爱自由
7楼-- · 2019-01-08 02:55

THe complexity is O(N)

First create an array of 100 ints initialiaze the first element of this array as the first element of the N values, keep track of the index of the current element with a another variable, call it CurrentBig

Iterate though the N values

if N[i] > M[CurrentBig] {

M[CurrentBig]=N[i]; ( overwrite the current value with the newly found larger number)

CurrentBig++;      ( go to the next position in the M array)

CurrentBig %= 100; ( modulo arithmetic saves you from using lists/hashes etc.)

M[CurrentBig]=N[i];    ( pick up the current value again to use it for the next Iteration of the N array)

} 

when done , print the M array from CurrentBig 100 times modulo 100 :-) For the student: make sure that the last line of the code does not trump valid data right before the code exits

查看更多
登录 后发表回答