Efficient way to count occurrences of a key in a s

2020-05-15 11:50发布

站内文章 / 移动开发

9 0

贼婆χ

女 | 书童

私信

可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

This was asked in on-site Microsoft interview.

Count the number of occurrences of a given key in an array.

I answered linear search because the elements may be scattered in the array. Say the key is found in the beginning and at the end. So we need to scan the entire array.

Next he asked what if the array is sorted?

Thought for a while and said I'll use linear search again. Because the repetitions of the key if present can be anywhere in the array. As an optimization I also said if first and last array elements are same you can take the array length as the answer.

Is my analysis correct in both the cases?

Example:

Input = [0 0 1 1 1 2 2 3 3], key = 1, Answer = 3
Input = [0 0 2 2 3 3],       key = 1, Answer = 0

回答1:

For unsorted array there is not much we can do other than linear search.

For sorted array you can do it in O(logN) using a slightly modified binary search:

Find the index of first occurrence of key, call it f.
Find the index of last occurrence of key, call it l.
If the key exists in the array l-f+1 is the answer.

Finding the first occurrence:

arr[i] is the first occurrence of key iff

arr[i] == key and either
- i == 0 ( it's the first element of the array) or
- arr[i-1] != key (it's not the first element of the array and element to it's left is different)

You can slightly modify the binary search to find the first occurrence.
In a binary search you terminate the search when you find arr[mid] == key.
Modify the condition such that you terminate the search when you find the first occurrence instead of any occurrence.

Algorithm:

low = 0
high = arrSize - 1 

while low <=  high

  mid = (low + high) / 2

  //if arr[mid] == key         // CHANGE
  if arr[mid] == key AND ( mid == 0 OR arr[mid-1] != key )
    return mid
  //else if ( key < arr[mid] ) // CHANGE
  else if ( key <= arr[mid] ) 
    high = mid - 1
  else
    low = mid + 1        
  end-if

end-while

return -1

Similarly you can find the last occurrence.

回答2:

For once, I will propose an implementation in C++.

size_t count(std::vector<int> const& vec, int key)
{
  auto p = std::equal_range(vec.begin(), vec.end(), key);
  return std::distance(p.first, p.second);
}

equal_range uses a binary search, the result is equivalent to:

std::make_pair(std::lower_bound(vec.begin(), vec.end(), key),
               std::upper_bound(vec.begin(), vec.end(), key));

but the implementation should makes it slightly faster, even though all are in O(log N) (in terms of number of comparison).

回答3:

#include<stdio.h>
int binarysearch(int a[],int n,int k,bool searchfirst){
    int result=-1;
    int low=0,high=n-1;
    while(low<=high){
        int mid=(low+high)/2;
        if(a[mid]==k)  {
              result=mid; 
           if(searchfirst)
              high=mid-1; 
            else
              low=mid+1;
    }
    else if(k<a[mid])  high=mid-1;
    else low=mid+1;
    }
    return result;
}

int main(){
    int a[]={1,1,1,2,2,3,3,3,6,6,6,6,6,7,7};
    int n=sizeof(a)/sizeof(a[0]);
    int x=6;
    int firstindex=binarysearch(a,n,x,true);
    printf("%d\n",firstindex);
    if(firstindex==-1){
        printf("elment not found in the array:\n ");
    }
    else {
        int lastindex=binarysearch(a,n,x,false);
        printf("%d\n",lastindex);
        printf("count is = %d", lastindex-firstindex+1);
    }

}

回答4:

You can use recursive version of binary search

int modifiedbinsearch_low(int* arr, int low, int high , int key)
{   
    if(low==high) return high ; 

    int mid = low + (high-low) /2;

    if(key >  arr[mid] ) { modifiedbinsearch_low(arr,mid + 1 , high,key);  } 
    else  { modifiedbinsearch_low(arr,low,mid,key);  }  
}
int modifiedbinsearch_high(int* arr, int low, int high , int key)
{   
    if(low==high) return high ; 

    int mid = low + (high-low) /2;

    if(key <  arr[mid] ) { modifiedbinsearch_high(arr,low,mid,key);  } 
    else  { modifiedbinsearch_high(arr,mid+1,high,key);  } 

}

int low = modifiedbinsearch_low( ...)

int high = modifiedbinsearch_high( ...)

(high - low) gives the number of keys

回答5:

** Time Complexity = O ( lg N ) where N is the size of array

** Arguments for binarySearchXXXXX:**

int[] array is a sorted array of length >= 1
int k : key to search

package array;

 public class BinarySearchQuestion {

public static int binarySearchFirst(int[] array, int k) {
    int begin = 0;
    int end = array.length-1;
    int mid = -1;
    while (begin <= end) {
        mid = begin + (end - begin) / 2;
        if (array[mid] < k) {
            begin = mid + 1;
        } else {
            end = mid - 1;
        }
    }
    //System.out.println("Begin index :: " + begin + " ,  array[begin] " + array[begin]);
    return (begin <= array.length - 1  && begin >= 0 && array[begin] != k) ? -1 : begin;
    //      return begin;
}

public static int binarySearchLast(int[] array, int k) {
    int begin = 0;
    int end = array.length - 1;
    int mid = -1;
    while (begin <= end) {
        mid = begin + (end - begin) / 2;
        if (array[mid] > k) {
            end = mid - 1;
        } else {
            begin = mid + 1;
        }
    }
    //System.out.println("Last index end :: " + end + " ,  array[mid] " + array[end]);
    return (end <= array.length - 1  && end >= 0 &&  array[end] != k) ? -1 : end;
    //return end;
}

/**
 * @param args
 */
public static void main(String[] args) {
             //     int[] array = { 0, 1,1,1, 2, 3, 4,4,4,5, 5, 5, 5, 5, 5, 5, 5, 5, 5,5,6,6,6,6, 6, 7, 7, 7,
             //             7, 8, 9 };
            //      int[] array = {-1, 0,1, 1,1,2,3};
    int[] array = {1,1,1};

    int low = binarySearchFirst(array, 1);
    int high = binarySearchLast(array, 1);
    int total = (high >= low && low != -1 && high != -1) ? ( high - low + 1 ): 0;
    System.out.println("Total Frequency " + total);
}

   }

回答6:

How about this for the sorted part, with O(logN) time complexity?

int count(int a[], int k, int l, int h) {
  if (l>h) {
    return 0;
  }
  int mid = (l+h)/2;
  if (k > a[mid]) {
     return count(a, k, mid+1, h);
  }
  else if (k < a[mid]) {
     return count(a, k, l, mid-1);
  }
  else {
     return count(a, k, mid+1, h) + count(a, k, l, mid-1) + 1;
  }
}

回答7:

package arrays;

/* * Given a sorted array, find the number of times an element occured. * Binary search O(lgn) * */

public class NumberOfN {

static int bSearchLeft(int[] arr, int start, int end, int n){

    while(start < end){

        int mid = (start + end)>>1;
        if(arr[mid] < n){
            start = mid + 1;
        }else{
            end = mid;
        }

    }

    return end;
}

static int bSearchRight(int[] arr, int start, int end, int n){

    while(start < end){

        int mid = (start + end)>>1;
        if(arr[mid] <= n){
            start = mid + 1;
        }else{
            end = mid;
        }

    }

    return end;
}

/**
 * @param args
 */
public static void main(String[] args) {

    int[] arr = new int[]{3,3,3,3};
    int n = 3;
    int indexLeft = bSearchLeft(arr, 0, arr.length, n);
    int indexRight = bSearchRight(arr, 0, arr.length, n);
    System.out.println(indexLeft + " " +indexRight);
    System.out.println("Number of occurences: " + (indexRight - indexLeft));
}

}

回答8:

We can solve this using Linear as well as Binary Search. But Linear Search will be O(n). Binary Search will give O(Logn). Hence it's better to use Binary search. The complete program is :

public class Test4 {
public static void main(String[] args) {
     int a[] = {1, 2, 2, 3, 3, 3, 6,6,6,6,6,66,7}; 
     int x =  6; 

         System.out.println(fix(a,x));
}

private static int fix(int[] a, int x) {
    int res = 0 ;

    for (int i = 0; i < a.length; i++) {
        int ch = a[i];
        if(x == ch) {res++ ;}
    }
    return res;
}
}

There is another follow up question that's asked is : 1st and last occurence of a given number in a sorted array.

class Occurence1 {

    public static void findFirstAndLast(int a[], int x) {

        int first = -1, last = -1;
        for (int i = 0; i < a.length; i++) {
            if (x == a[i]) {
                if (first == -1) {
                    first = i;
                }
                // update last
                last = i;
            } // if

        } // for                                                                           
        if (first != -1) {
            System.out.println("First Occurrence = " + first);
            System.out.println("Last Occurrence = " + last);
        } 
    }// end1

    public static void main(String[] args) {
        int arr[] = { 1, 2, 2, 2, 2, 3, 4, 7, 8, 8 };
        int x = 8;
        findFirstAndLast(arr, x);
    }
}

In Python :

def findFirstAndLast(a, x):
    first = -1 ; last = -1
    for i in range(len(a)) :
        if(x == a[i]): 
            if(first == -1):first = i 

         # update last if the first contains oter value than -1    
        last = i

    if(first != -1):
        print("first => ",first)
        print("last =>", last)       


a = [1, 2, 3,4, 5, 6, 7, 8, 1, 10, 10]
x = 10
findFirstAndLast(a, x)

回答9:

If the array is unsorted, yes, linear search from one end to the other is as good as it gets.

However, if the array is sorted, you can do better than linear time by applying binary or interpolation search techniques.

Treat the problem as the same as "Find the number X in a sorted list" with the added detail of "then scan left and right to determine how many times X appears". The first part, the search, is best done with binary or interpolation search in most cases.

http://en.wikipedia.org/wiki/Interpolation_search

http://en.wikipedia.org/wiki/Binary_search

回答10:

Yes, you're right for unsorted array, but for sorted array you can use binary search to locate one occurence of the element and once that one occurence is found only scan adjacent elements until you find mismatches and then stop.