Limit input data to achieve a better Big O complex

2019-01-29 13:54发布

问题:

You are given an unsorted array of n integers, and you would like to find if there are any duplicates in the array (i.e. any integer appearing more than once). Describe an algorithm (implemented with two nested loops) to do this.

The question that I am stuck at is: How can you limit the input data to achieve a better Big O complexity? Describe an algorithm for handling this limited data to find if there are any duplicates. What is the Big O complexity?

Your help will be greatly appreciated. This is not related to my coursework, assignment or coursework and such. It's from the previous year exam paper and I am doing some self-study but seem to be stuck on this question. The only possible solution that i could come up with is:

If we limit the data, and use nested loops to perform operations to find if there are duplicates. The complexity would be O(n) simply because the amount of time the operations take to perform is proportional to the data size.

If my answer makes no sense, then please ignore it and if you could, then please suggest possible solutions/ working out to this answer.

If someone could help me solve this answer, I would be grateful as I have attempted countless possible solution, all of which seems to be not the correct one.

Edited part, again.. Another possible solution (if effective!):
We could implement a loop to sort the array so that it sorts the array (from lowest integer to highest integer), therefore the duplicates will be right next to each other making them easier and faster to be identified.
The big O complexity would still be O(n^2).
Since this is linear type, it would simply use the first loop and iterate n-1 times as we are getting the index in the array (in the first iteration it could be, for instance, 1) and store this in a variable names 'current'.
The loop will update the current variable by +1 each time through the iteration, within that loop, we now write another loop to compare the current number to the next number and if it equals to the next number, we can print using a printf statement else we move back to the outer loop to update the current variable by + 1 (next value in the array) and update the next variable to hold the value of the number after the value in current.

回答1:

You can do linearly (O(n)) for any input if you use hash tables (which have constant look-up time).

However, this is not what you are being asked about.

By limiting the possible values in the array, you can achieve linear performance.

E.g., if your integers have range 1..L, you can allocate a bit array of length L, initialize it to 0, and iterate over your input array, checking and flipping the appropriate bit for each input.



回答2:

A variance of Bucket Sort will do. This will give you complexity of O(n) where 'n' is the number of input elements.

But one restriction - max value. You should know the max value your integer array can take. Lets say it as m.

The idea is to create a bool array of size m (all initialized to false). Then iterate over your array. As you find an element, set bucket[m] to true. If it is already true then you've encountered a duplicate.

A java code,



// alternatively, you can iterate over the array to find the maxVal which again is O(n).
public boolean findDup(int [] arr, int maxVal)
{
        // java by default assigns false to all the values.
    boolean bucket[] = new boolean[maxVal];

    for (int elem : arr)
    {

        if (bucket[elem])
        {
           return true; // a duplicate found
        }

        bucket[elem] = true;
    }   
    return false;   
}

But the constraint here is the space. You need O(maxVal) space.



回答3:

nested loops get you O(N*M) or O(N*log(M)) for O(N) you can not use nested loops !!!

I would do it by use of histogram instead:

DWORD in[N]={ ... }; // input data ... values are from < 0 , M )
DWORD his[M]={ ... }; // histogram of in[]
int i,j;

// compute histogram O(N)
for (i=0;i<M;i++) his[i]=0;     // this can be done also by memset ...
for (i=0;i<N;i++) his[in[i]]++; // if the range of values is not from 0 then shift it ...

// remove duplicates O(N)
for (i=0,j=0;i<N;i++)
 {
 his[in[i]]--;      // count down duplicates
 in[j]=in[i];       // copy item
 if (his[in[i]]<=0) j++; // if not duplicate then do not delete it
 }
// now j holds the new in[] array size

[Notes]

  • if value range is too big with sparse areas then you need to convert his[]
  • to dynamic list with two values per item
  • one is the value from in[] and the second is its occurrence count
  • but then you need nested loop -> O(N*M)
  • or with binary search -> O(N*log(M))