How to efficiently remove duplicates from an array

2018-12-31 09:39发布

I was asked to write my own implementation to remove duplicated values in an array. Here is what I have created. But after tests with 1,000,000 elements it took very long time to finish. Is there something that I can do to improve my algorithm or any bugs to remove ?

I need to write my own implementation - not to use Set, HashSet etc. Or any other tools such as iterators. Simply an array to remove duplicates.

public static int[] removeDuplicates(int[] arr) {

    int end = arr.length;

    for (int i = 0; i < end; i++) {
        for (int j = i + 1; j < end; j++) {
            if (arr[i] == arr[j]) {                  
                int shiftLeft = j;
                for (int k = j+1; k < end; k++, shiftLeft++) {
                    arr[shiftLeft] = arr[k];
                }
                end--;
                j--;
            }
        }
    }

    int[] whitelist = new int[end];
    for(int i = 0; i < end; i++){
        whitelist[i] = arr[i];
    }
    return whitelist;
}

30条回答
宁负流年不负卿
2楼-- · 2018-12-31 10:02

Note: I am assuming the array is sorted.

Code:

int[] input = new int[]{1, 1, 3, 7, 7, 8, 9, 9, 9, 10};
int current = input[0];
boolean found = false;

for (int i = 0; i < input.length; i++) {
    if (current == input[i] && !found) {
        found = true;
    } else if (current != input[i]) {
        System.out.print(" " + current);
        current = input[i];
        found = false;
    }
}
System.out.print(" " + current);

output:

  1 3 7 8 9 10
查看更多
春风洒进眼中
3楼-- · 2018-12-31 10:02

Since this question is still getting a lot of attention, I decided to answer it by copying this answer from Code Review.SE:

You're following the same philosophy as the bubble sort, which is very, very, very slow. Have you tried this?:

  • Sort your unordered array with quicksort. Quicksort is much faster than bubble sort (I know, you are not sorting, but the algorithm you follow is almost the same as bubble sort to traverse the array).

  • Then start removing duplicates (repeated values will be next to each other). In a for loop you could have two indices: source and destination. (On each loop you copy source to destination unless they are the same, and increment both by 1). Every time you find a duplicate you increment source (and don't perform the copy). @morgano

查看更多
心情的温度
4楼-- · 2018-12-31 10:02
public static int[] removeDuplicates(int[] arr){
    HashSet<Integer> set = new HashSet<>();
    final int len = arr.length;
    //changed end to len
    for(int i = 0; i < len; i++){
        set.add(arr[i]);
    }

    int[] whitelist = new int[set.size()];
    int i = 0;
    for (Iterator<Integer> it = set.iterator(); it.hasNext();) {
        whitelist[i++] = it.next();
    }
    return whitelist;
}

Runs in O(N) time instead of your O(N^3) time

查看更多
无与为乐者.
5楼-- · 2018-12-31 10:03
class Demo 
{
    public static void main(String[] args) 
    {
        int a[]={3,2,1,4,2,1};
        System.out.print("Before Sorting:");
        for (int i=0;i<a.length; i++ )
        {
            System.out.print(a[i]+"\t");
        }
        System.out.print ("\nAfter Sorting:");
        //sorting the elements
        for(int i=0;i<a.length;i++)
        {
            for(int j=i;j<a.length;j++)
            {
                if(a[i]>a[j])
                {
                    int temp=a[i];
                    a[i]=a[j];
                    a[j]=temp;
                }

            }
        }

        //After sorting
        for(int i=0;i<a.length;i++)
        {
            System.out.print(a[i]+"\t");
        }
        System.out.print("\nAfter removing duplicates:");
        int b=0;
        a[b]=a[0];
        for(int i=0;i<a.length;i++)
        {
            if (a[b]!=a[i])
            {
                b++;
                a[b]=a[i];
            }
        }
        for (int i=0;i<=b;i++ )
        {
            System.out.print(a[i]+"\t");
        }
    }
}
  OUTPUT:Before Sortng:3 2 1 4 2 1 After Sorting:1 1 2 2 3 4 
                Removing Duplicates:1 2 3 4
查看更多
公子世无双
6楼-- · 2018-12-31 10:04

What if you create two boolean arrays: 1 for negative values and 1 for positive values and init it all on false.

Then you cycle thorugh the input array and lookup in the arrays if you've encoutered the value already. If not, you add it to the output array and mark it as already used.

查看更多
怪性笑人.
7楼-- · 2018-12-31 10:05
public static void main(String args[]) {
    int[] intarray = {1,2,3,4,5,1,2,3,4,5,1,2,3,4,5};

    Set<Integer> set = new HashSet<Integer>();
    for(int i : intarray) {
        set.add(i);
    }

    Iterator<Integer> setitr = set.iterator();
    for(int pos=0; pos < intarray.length; pos ++) {
        if(pos < set.size()) {
            intarray[pos] =setitr.next();
        } else {
            intarray[pos]= 0;
        }
    }

    for(int i: intarray)
    System.out.println(i);
}
查看更多
登录 后发表回答