可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

I have an array which looks like this:

int array[] = {4.53, 3.65, 7.43, 9.54, 0.72, 0.0}

I am just wondering what method I can use to partially sort this array to bring the top three biggest doubles to the front. I am looking for the most efficient method to get the top three highest numbers in this array.

So far I have been using qsort, but I am just looking for another method to do this which could be even faster. I know that qsort is O(nlogn) for best cases and O(n^2) for worst cases, but is there an even more efficient method to achieve this problem? What I mean by efficient is just a faster way to do it, better than O(nlogn).

Any help would be great

回答1:

Simply maintain first, second, third.

   first =  array[0];
   second = array[1];
   third = array[2];

   /* scratch sort for three elements */
   if(first < second)
     swap(first, second);
  if(first < third)
     swap(first, third);
  if(second < third)
     swap(second, third);

  /* now go through, bubbling up if we have a hit */ 
  for(i=3;i<N;i++)
  {
      if(third < array[i])
      {
         third = array[i];
         if(second < third)
         {
            swap(second, third);
            if(first < second)
              swap(first, second);
         }
      }
  }

I wouldn't try to scale up to k = four. I think three is about the limit for hardcoding it. As k get large you need to move to a formal method.

This doesn't answer the question you actually asked, which was how to partially sort, but it seems to be what you want.

If you wish to partially sort, you can use quicksort, and simply return early when the pivot goes above the bound you are interested it. So our first pivot divides into five, two. Ignore the last two, and only actually do the sub-sorts of the last five. But whilst it will be faster than quicksort, it won't be a game changer. If you can get a conservative upper bound on the k'th item (eg it's always going to be at most 25% between the minimum and the mean) you can quickly eliminate most of the data. If you get it wrong it's just another pass or two.

Using the quicksort method

  int sortfirstk_r(int *array, int N, int k)
  {
     int pivot = 0;
     int j = n -1;
     int i = 1;

     while(i <= j)
     {
        if(array[pivot] < array[i])
          swap(array[i], array[j--])
        else
          i++;

     }
     sortfirstk_r(array, i, k < i ? k : i);
     if(i < k)
       sortfirstk_r(array +i, N -i, k - i); 

  }

(Untested and there might be bugs in the slightly tricky sort logic).

However we've naively used the first element as the pivot. If we're sorting a large data set, and it has a normal distribution, and we want the top 1%, the z-score is 2.326. Take a bit more to allow us some sampling error, and we make a first pass with a pivot set at say 2.3 standard deviations above the mean. Then we split the distribution into two sets, the top 1% plus a bit, and the rest. We don't need to further process the rest, and just sort the top group.

回答2:

For your specific problem the quickest method is to do something similar to below since you only want three elements: (It may be quicker to use a priority queue or a different data structure, but the speed will not be very noticeable)

#include"stdio.h"
void moveThreeMaxToFront(double * arr, int length);
void moveMaxToFront(double*arr, int length);
int main() {
  int i;
  double meh[]={ 5,3,1,7,2,9,11};
  moveThreeMaxToFront(meh, 7);
  for(i=0; i<7; i++)
    printf("%f \n", meh[i]);
}
void moveThreeMaxToFront(double * arr, int length) {
  for(int i=0; i<3; i++)
    moveMaxToFront(arr++, length-i);
}
void moveMaxToFront(double* arr, int length) {
  int i;
  for(i=1; i<length; i++) {
    if(arr[i]>arr[0]) {
      double tmp=arr[i];
      arr[i]=arr[0];
      arr[0]=tmp;
    }
  }
}

However, it is potentially faster if k becomes substantially larger to either implement Quickselect or use the partial_sort method which I believe implements quickselect. However, the quickselect algorithm for the given case has an average constant of approximately 3.4-4.4 which is slightly larger than the constant above(3). Please also note that quickselect has an average run time of O(n). This run time can be guaranteed using median of 3, but this is not advised as it significantly increases the average constant. Intro-select properly handles this to prevent the worst case of quickselect while retaining its average case.

回答3:

I would suggest radix sort it is the most efficient sorting method for such cases and has complexity O(n). You could even change it alittle bit to stop when find three max numbers. You can find-understand radix short: https://www.cs.usfca.edu/~galles/visualization/RadixSort.html

回答4:

If we are supposed to find out the three largest number then we can run findMax method three times and once a maximum is found replace appropriate index (1, 2 or 3) with maximum in array. This way we leave you with array will 3 largest elements at start of array in c * O(n) time-complexity.

Note: I used fact that you have to find first three maximum doubles

double findMax(double arr[i], double prevMax){
    double maximum = -100000000000;
    for(int i = 0; i < arr.length; i++){
        if(arr[i] < prevMax)
        maximum = max(arr[i], maximum);
    }
    return maximum;
 }