Given an array of integers, find the first integer

2019-03-31 12:47发布

问题:

Given an array of integers, find the first integer that is unique.

my solution: use std::map

put integer (number as key, its index as value) to it one by one (O(n^2 lgn)), if have duplicate, remove the entry from the map (O(lg n)), after putting all numbers into the map, iterate the map and find the key with smallest index O(n).

O(n^2 lgn) because map needs to do sorting.

It is not efficient.

other better solutions?

回答1:

I believe that the following would be the optimal solution, at least based on time / space complexity:

Step 1: Store the integers in a hash map, which holds the integer as a key and the count of the number of times it appears as the value. This is generally an O(n) operation and the insertion / updating of elements in the hash table should be constant time, on the average. If an integer is found to appear more than twice, you really don't have to increment the usage count further (if you don't want to).

Step 2: Perform a second pass over the integers. Look each up in the hash map and the first one with an appearance count of one is the one you were looking for (i.e., the first single appearing integer). This is also O(n), making the entire process O(n).

Some possible optimizations for special cases:

Optimization A: It may be possible to use a simple array instead of a hash table. This guarantees O(1) even in the worst case for counting the number of occurrences of a particular integer as well as the lookup of its appearance count. Also, this enhances real time performance, since the hash algorithm does not need to be executed. There may be a hit due to potentially poorer locality of reference (i.e., a larger sparse table vs. the hash table implementation with a reasonable load factor). However, this would be for very special cases of integer orderings and may be mitigated by the hash table's hash function producing pseudorandom bucket placements based on the incoming integers (i.e., poor locality of reference to begin with).

Each byte in the array would represent the count (up to 255) for the integer represented by the index of that byte. This would only be possible if the difference between the lowest integer and the highest (i.e., the cardinality of the domain of valid integers) was small enough such that this array would fit into memory. The index in the array of a particular integer would be its value minus the smallest integer present in the data set.

For example on modern hardware with a 64-bit OS, it is quite conceivable that a 4GB array can be allocated which can handle the entire domain of 32-bit integers. Even larger arrays are conceivable with sufficient memory.

The smallest and largest integers would have to be known before processing, or another linear pass through the data using the minmax algorithm to find out this information would be required.

Optimization B: You could optimize Optimization A further, by using at most 2 bits per integer (One bit indicates presence and the other indicates multiplicity). This would allow for the representation of four integers per byte, extending the array implementation to handle a larger domain of integers for a given amount of available memory. More bit games could be played here to compress the representation further, but they would only support special cases of data coming in and therefore cannot be recommended for the still mostly general case.



回答2:

All this for no reason. Just using 2 for-loops & a variable would give you a simple O(n^2) algo.

If you are taking all the trouble of using a hash map, then it might as well be what @Micheal Goldshteyn suggests

UPDATE: I know this question is 1 year old. But was looking through the questions I answered and came across this. Thought there is a better solution than using a hashtable.

When we say unique, we will have a pattern. Eg: [5, 5, 66, 66, 7, 1, 1, 77]. In this lets have moving window of 3. first consider (5,5,66). we can easily estab. that there is duplicate here. So move the window by 1 element so we get (5,66,66). Same here. move to next (66,66,7). Again dups here. next (66,7,1). No dups here! take the middle element as this has to be the first unique in the set. The left element belongs to the dup so could 1. Hence 7 is the first unique element.

space: O(1) time: O(n) * O(m^2) = O(n) * 9 ≈ O(n)



回答3:

Inserting to a map is O(log n) not O(n log n) so inserting n keys will be n log n. also its better to use set.



回答4:

Although it's O(n^2), the following has small coefficients, isn't too bad on the cache, and uses memmem() which is fast.

 for(int x=0;x<len-1;x++)
     if(memmem(&array[x+1], sizeof(int)*(len-(x+1)), array[x], sizeof(int))==NULL &&
        memmem(&array[x+1], sizeof(int)*(x-1), array[x], sizeof(int))==NULL)
            return array[x];


回答5:

public static string firstUnique(int[] input)  
{
    int size = input.Length;
    bool[] dupIndex = new bool[size];
    for (int i = 0; i < size; ++i) 
    {
        if (dupIndex[i]) 
        {
            continue;
        } 
        else if (i == size - 1) 
        {
            return input[i].ToString();
        }
        for (int j = i + 1; j < size; ++j) 
        {
            if (input[i]==input[j]) 
            {
                dupIndex[j] = true;
                break;
            } 
            else if (j == size - 1)
            {
                return input[i].ToString();
            }
        }
    }
    return "No unique element";
}


回答6:

@user3612419

  Solution given you is good with some what close to O(N*N2) but further optimization in same code is possible I just added two-3 lines that you missed.



public static string firstUnique(int[] input)  
{
  int size = input.Length;
  bool[] dupIndex = new bool[size];
  for (int i = 0; i < size; ++i) 
  {
     if (dupIndex[i]) 
    {
        continue;
    } 
    else if (i == size - 1) 
    {
        return input[i].ToString();
    }
    for (int j = i + 1; j < size; ++j) 
    {
  if(dupIndex[j]==true)
  {
 continue;
 }
        if (input[i]==input[j]) 
        {
            dupIndex[j] = true;
    dupIndex[i] = true;
            break;
        } 
        else if (j == size - 1)
        {
            return input[i].ToString();
         }
     } 
  }
return "No unique element";
 }