How to efficiently remove duplicates from an array

2018-12-31 09:39发布

I was asked to write my own implementation to remove duplicated values in an array. Here is what I have created. But after tests with 1,000,000 elements it took very long time to finish. Is there something that I can do to improve my algorithm or any bugs to remove ?

I need to write my own implementation - not to use Set, HashSet etc. Or any other tools such as iterators. Simply an array to remove duplicates.

public static int[] removeDuplicates(int[] arr) {

    int end = arr.length;

    for (int i = 0; i < end; i++) {
        for (int j = i + 1; j < end; j++) {
            if (arr[i] == arr[j]) {                  
                int shiftLeft = j;
                for (int k = j+1; k < end; k++, shiftLeft++) {
                    arr[shiftLeft] = arr[k];
                }
                end--;
                j--;
            }
        }
    }

    int[] whitelist = new int[end];
    for(int i = 0; i < end; i++){
        whitelist[i] = arr[i];
    }
    return whitelist;
}

30条回答
倾城一夜雪
2楼-- · 2018-12-31 09:40

Okay, so you cannot use Set or other collections. One solution I don't see here so far is one based on the use of a Bloom filter, which essentially is an array of bits, so perhaps that passes your requirements.

The Bloom filter is a lovely and very handy technique, fast and space-efficient, that can be used to do a quick check of the existence of an element in a set without storing the set itself or the elements. It has a (typically small) false positive rate, but no false negative rate. In other words, for your question, if a Bloom filter tells you that an element hasn't been seen so far, you can be sure it hasn't. But if it says that an element has been seen, you actually need to check. This still saves a lot of time if there aren't too many duplicates in your list (for those, there is no looping to do, except in the small probability case of a false positive --you typically chose this rate based on how much space you are willing to give to the Bloom filter (rule of thumb: less than 10 bits per unique element for a false positive rate of 1%).

There are many implementations of Bloom filters, see e.g. here or here, so I won't repeat that in this answer. Let us just assume the api described in that last reference, in particular, the description of put(E e):

true if the Bloom filter's bits changed as a result of this operation. If the bits changed, this is definitely the first time object has been added to the filter. If the bits haven't changed, this might be the first time object has been added to the filter. (...)

An implementation using such a Bloom filter would then be:

public static int[] removeDuplicates(int[] arr) {
    ArrayList<Integer> out = new ArrayList<>();
    int n = arr.length;
    BloomFilter<Integer> bf = new BloomFilter<>(...);  // decide how many bits and how many hash functions to use (compromise between space and false positive rate)

    for (int e : arr) {
        boolean might_contain = !bf.put(e);
        boolean found = false;
        if (might_contain) {
            // check if false positive
            for (int u : out) {
                if (u == e) {
                    found = true;
                    break;
                }
            }
        }
        if (!found) {
            out.add(e);
        }
    }
    return out.stream().mapToInt(i -> i).toArray();
}

Obviously, if you can alter the incoming array in place, then there is no need for an ArrayList: at the end, when you know the actual number of unique elements, just arraycopy() those.

查看更多
临风纵饮
3楼-- · 2018-12-31 09:40
 package javaa;

public class UniqueElementinAnArray 
{

 public static void main(String[] args) 
 {
    int[] a = {10,10,10,10,10,100};
    int[] output = new int[a.length];
    int count = 0;
    int num = 0;

    //Iterate over an array
    for(int i=0; i<a.length; i++)
    {
        num=a[i];
        boolean flag = check(output,num);
        if(flag==false)
        {
            output[count]=num;
            ++count;
        }

    }

    //print the all the elements from an array except zero's (0)
    for (int i : output) 
    {
        if(i!=0 )
            System.out.print(i+"  ");
    }

}

/***
 * If a next number from an array is already exists in unique array then return true else false
 * @param arr   Unique number array. Initially this array is an empty.
 * @param num   Number to be search in unique array. Whether it is duplicate or unique.
 * @return  true: If a number is already exists in an array else false 
 */
public static boolean check(int[] arr, int num)
{
    boolean flag = false;
    for(int i=0;i<arr.length; i++)
    {
        if(arr[i]==num)
        {
            flag = true;
            break;
        }
    }
    return flag;
}

}

查看更多
孤独总比滥情好
4楼-- · 2018-12-31 09:41

Slight modification to the original code itself, by removing the innermost for loop.

public static int[] removeDuplicates(int[] arr){
    int end = arr.length;

    for (int i = 0; i < end; i++) {
        for (int j = i + 1; j < end; j++) {
            if (arr[i] == arr[j]) {                  
                /*int shiftLeft = j;
                for (int k = j+1; k < end; k++, shiftLeft++) {
                    arr[shiftLeft] = arr[k];
                }*/
                arr[j] = arr[end-1];
                end--;
                j--;
            }
        }
    }

    int[] whitelist = new int[end];
    /*for(int i = 0; i < end; i++){
        whitelist[i] = arr[i];
    }*/
    System.arraycopy(arr, 0, whitelist, 0, end);
    return whitelist;
}
查看更多
若你有天会懂
5楼-- · 2018-12-31 09:41

Here is my solution. The time complexity is o(n^2)

public String removeDuplicates(char[] arr) {
        StringBuilder sb = new StringBuilder();

        if (arr == null)
            return null;
        int len = arr.length;

        if (arr.length < 2)
            return sb.append(arr[0]).toString();

        for (int i = 0; i < len; i++) {

            for (int j = i + 1; j < len; j++) {
                if (arr[i] == arr[j]) {
                    arr[j] = 0;

                }
            }
            if (arr[i] != 0)
                sb.append(arr[i]);
        }

        return sb.toString().trim();
    }
查看更多
春风洒进眼中
6楼-- · 2018-12-31 09:42

For a sorted Array, just check the next index:

//sorted data!
public static int[] distinct(int[] arr) {
    int[] temp = new int[arr.length];

    int count = 0;
    for (int i = 0; i < arr.length; i++) {
        int current = arr[i];

        if(count > 0 )
            if(temp[count - 1] == current)
                continue;

        temp[count] = current;
        count++;
    }

    int[] whitelist = new int[count];
    System.arraycopy(temp, 0, whitelist, 0, count);

    return whitelist;
}
查看更多
旧时光的记忆
7楼-- · 2018-12-31 09:43

There exists many solution of this problem.

  1. The sort approach

    • You sort your array and resolve only unique items
  2. The set approach

    • You declare a HashSet where you put all item then you have only unique ones.
  3. You create a boolean array that represent the items all ready returned, (this depend on your data in the array).

If you deal with large amount of data i would pick the 1. solution. As you do not allocate additional memory and sorting is quite fast. For small set of data the complexity would be n^2 but for large i will be n log n.

查看更多
登录 后发表回答