How to efficiently remove duplicates from an array

2018-12-31 09:39发布

I was asked to write my own implementation to remove duplicated values in an array. Here is what I have created. But after tests with 1,000,000 elements it took very long time to finish. Is there something that I can do to improve my algorithm or any bugs to remove ?

I need to write my own implementation - not to use Set, HashSet etc. Or any other tools such as iterators. Simply an array to remove duplicates.

public static int[] removeDuplicates(int[] arr) {

    int end = arr.length;

    for (int i = 0; i < end; i++) {
        for (int j = i + 1; j < end; j++) {
            if (arr[i] == arr[j]) {                  
                int shiftLeft = j;
                for (int k = j+1; k < end; k++, shiftLeft++) {
                    arr[shiftLeft] = arr[k];
                }
                end--;
                j--;
            }
        }
    }

    int[] whitelist = new int[end];
    for(int i = 0; i < end; i++){
        whitelist[i] = arr[i];
    }
    return whitelist;
}

30条回答
梦寄多情
2楼-- · 2018-12-31 09:43
public static void main(String[] args) {
        Integer[] intArray = { 1, 1, 1, 2, 4, 2, 3, 5, 3, 6, 7, 3, 4, 5 };
        Integer[] finalArray = removeDuplicates(intArray);
        System.err.println(Arrays.asList(finalArray));
    }

    private static Integer[] removeDuplicates(Integer[] intArray) {
        int count = 0;
        Integer[] interimArray = new Integer[intArray.length];
        for (int i = 0; i < intArray.length; i++) {
            boolean exists = false;
            for (int j = 0; j < interimArray.length; j++) {
                if (interimArray[j]!=null && interimArray[j] == intArray[i]) {
                    exists = true;
                }
            }
            if (!exists) {
                interimArray[count] = intArray[i];
                count++;
            }
        }
        final Integer[] finalArray = new Integer[count];
        System.arraycopy(interimArray, 0, finalArray, 0, count);
        return finalArray;
    }
查看更多
一个人的天荒地老
3楼-- · 2018-12-31 09:43

Why do all people not check this below lines?

I need to write my own implementation - not to use Set, HashSet etc. Or any other tools such as iterators. Simply an array to remove duplicates.

I'm posting very simple implementation with caring the above line.

public class RemoveDuplicates {

public static void main(String[] args) {

    int[] arr = { 1, 2, 3, 4, 2, 3, 1 }; // input array
    int len = arr.length;
    for (int i = 0; i < arr.length; i++) {
        for (int j = i + 1; j < len; j++) {
            if (arr[i] == arr[j]) {
                while (j < (len) - 1) {
                    arr[j] = arr[j - 1];
                    j++;
                }
                len--;
            }
        }
    }
    for (int i = 0; i < len; i++) {
        System.out.print("  " +arr[i]);
    }

   }
 }

Input : 1, 2, 3, 4, 2, 3, 1

Output : 1 2 3 4

查看更多
公子世无双
4楼-- · 2018-12-31 09:44
package com.pari.practice;

import java.util.HashSet;
import java.util.Iterator;

import com.pari.sort.Sort;

public class RemoveDuplicates {

 /**
 * brute force- o(N square)
 * 
 * @param input
 * @return
 */
public static int[] removeDups(int[] input){
    boolean[] isSame = new boolean[input.length];
    int sameNums = 0;

    for( int i = 0; i < input.length; i++ ){
        for( int j = i+1; j < input.length; j++){
            if( input[j] == input[i] ){ //compare same
                isSame[j] = true;
                sameNums++;
            }
        }
    }

    //compact the array into the result.
    int[] result = new int[input.length-sameNums];
    int count = 0;
    for( int i = 0; i < input.length; i++ ){
        if( isSame[i] == true) {
            continue;
        }
        else{
            result[count] = input[i];
            count++;
        }
    }

    return result;
}

/**
 * set - o(N)
 * does not guarantee order of elements returned - set property
 * 
 * @param input
 * @return
 */
public static int[] removeDups1(int[] input){
    HashSet myset = new HashSet();

    for( int i = 0; i < input.length; i++ ){
        myset.add(input[i]);
    }

    //compact the array into the result.
    int[] result = new int[myset.size()];
    Iterator setitr = myset.iterator();
    int count = 0;
    while( setitr.hasNext() ){
        result[count] = (int) setitr.next();
        count++;
    }

return result;
}

/**
 * quicksort - o(Nlogn)
 * 
 * @param input
 * @return
 */
public static int[] removeDups2(int[] input){
    Sort st = new Sort();
    st.quickSort(input, 0, input.length-1); //input is sorted

    //compact the array into the result.
    int[] intermediateResult = new int[input.length];
    int count = 0;
    int prev = Integer.MIN_VALUE;
    for( int i = 0; i < input.length; i++ ){
        if( input[i] != prev ){
            intermediateResult[count] = input[i];
            count++;
        }
        prev = input[i];
    }

    int[] result = new int[count];
    System.arraycopy(intermediateResult, 0, result, 0, count);

    return result;
}


public static void printArray(int[] input){
    for( int i = 0; i < input.length; i++ ){
        System.out.print(input[i] + " ");
    }
}

public static void main(String[] args){
    int[] input = {5,6,8,0,1,2,5,9,11,0};
    RemoveDuplicates.printArray(RemoveDuplicates.removeDups(input));
    System.out.println();
    RemoveDuplicates.printArray(RemoveDuplicates.removeDups1(input));
    System.out.println();
    RemoveDuplicates.printArray(RemoveDuplicates.removeDups2(input));
}
}

Output: 5 6 8 0 1 2 9 11

0 1 2 5 6 8 9 11

0 1 2 5 6 8 9 11

I have just written the above code for trying out. thanks.

查看更多
怪性笑人.
5楼-- · 2018-12-31 09:45

You need to sort your array then then loop and remove duplicates. As you cannot use other tools you need to write be code yourself.

You can easily find examples of quicksort in Java on the internet (on which this example is based).

public static void main(String[] args) throws Exception {
    final int[] original = new int[]{1, 1, 2, 8, 9, 8, 4, 7, 4, 9, 1};
    System.out.println(Arrays.toString(original));
    quicksort(original);
    System.out.println(Arrays.toString(original));
    final int[] unqiue = new int[original.length];
    int prev = original[0];
    unqiue[0] = prev;
    int count = 1;
    for (int i = 1; i < original.length; ++i) {
        if (original[i] != prev) {
            unqiue[count++] = original[i];
        }
        prev = original[i];
    }
    System.out.println(Arrays.toString(unqiue));
    final int[] compressed = new int[count];
    System.arraycopy(unqiue, 0, compressed, 0, count);
    System.out.println(Arrays.toString(compressed));
}

private static void quicksort(final int[] values) {
    if (values.length == 0) {
        return;
    }
    quicksort(values, 0, values.length - 1);
}

private static void quicksort(final int[] values, final int low, final int high) {
    int i = low, j = high;
    int pivot = values[low + (high - low) / 2];
    while (i <= j) {
        while (values[i] < pivot) {
            i++;
        }
        while (values[j] > pivot) {
            j--;
        }
        if (i <= j) {
            swap(values, i, j);
            i++;
            j--;
        }
    }
    if (low < j) {
        quicksort(values, low, j);
    }
    if (i < high) {
        quicksort(values, i, high);
    }
}

private static void swap(final int[] values, final int i, final int j) {
    final int temp = values[i];
    values[i] = values[j];
    values[j] = temp;
}

So the process runs in 3 steps.

  1. Sort the array - O(nlgn)
  2. Remove duplicates - O(n)
  3. Compact the array - O(n)

So this improves significantly on your O(n^3) approach.

Output:

[1, 1, 2, 8, 9, 8, 4, 7, 4, 9, 1]
[1, 1, 1, 2, 4, 4, 7, 8, 8, 9, 9]
[1, 2, 4, 7, 8, 9, 0, 0, 0, 0, 0]
[1, 2, 4, 7, 8, 9]

EDIT

OP states values inside array doesn't matter really. But I can assume that range is between 0-1000. This is a classic case where an O(n) sort can be used.

We create an array of size range +1, in this case 1001. We then loop over the data and increment the values on each index corresponding to the datapoint.

We can then compact the resulting array, dropping values the have not been incremented. This makes the values unique as we ignore the count.

public static void main(String[] args) throws Exception {
    final int[] original = new int[]{1, 1, 2, 8, 9, 8, 4, 7, 4, 9, 1, 1000, 1000};
    System.out.println(Arrays.toString(original));
    final int[] buckets = new int[1001];
    for (final int i : original) {
        buckets[i]++;
    }
    final int[] unique = new int[original.length];
    int count = 0;
    for (int i = 0; i < buckets.length; ++i) {
        if (buckets[i] > 0) {
            unique[count++] = i;
        }
    }
    final int[] compressed = new int[count];
    System.arraycopy(unique, 0, compressed, 0, count);
    System.out.println(Arrays.toString(compressed));
}

Output:

[1, 1, 2, 8, 9, 8, 4, 7, 4, 9, 1, 1000, 1000]
[1, 2, 4, 7, 8, 9, 1000]
查看更多
爱死公子算了
6楼-- · 2018-12-31 09:45

Please check this. It will work for both sorted/unsorted array. The complexity is O(n^2) same as bubble sort. Yes the complexity can be improved further with first sort and then binary search. But this is simple enough to work on every cases except negative element (-1). This can also be changed by using large integer value instead of -1.

void remove_duplicates(int *A, int N)

{

int i,j;

for (i=1; i<N; i++) {
    if (A[i] == -1) continue;
    for (j=i+1; j<=N; j++) {
        if (A[i] == A[j])
            A[j] = -1;
    }
}
}

int main() {

int N;
int A[1001];
int i;

printf("Enter N: ");
scanf("%d", &N);
printf("Enter the elements:\n");
for (i=1; i<=N; i++)
    scanf("%d", &A[i]);

remove_duplicates(A, N);
for (i=1; i<=N; i++) {
    if (A[i] == -1)
        continue;
    printf("%d  ", A[i]);
}
printf("\n");

return 0;
}
查看更多
人气声优
7楼-- · 2018-12-31 09:46

I feel Android Killer's idea is great, but I just wondered if we can leverage HashMap. So I did a little experiment. And I found HashMap seems faster than HashSet.

Here is code:

    int[] input = new int[1000000];

    for (int i = 0; i < input.length; i++) {
        Random random = new Random();
        input[i] = random.nextInt(200000);
    }

    long startTime1 = new Date().getTime();
    System.out.println("Set start time:" + startTime1);

    Set<Integer> resultSet = new HashSet<Integer>();

    for (int i = 0; i < input.length; i++) {
        resultSet.add(input[i]);
    }

    long endTime1 = new Date().getTime();
    System.out.println("Set end time:"+ endTime1);
    System.out.println("result of set:" + (endTime1 - startTime1));     
    System.out.println("number of Set:" + resultSet.size() + "\n");

    long startTime2 = new Date().getTime();
    System.out.println("Map start time:" + startTime1);

    Map<Integer, Integer> resultMap = new HashMap<Integer, Integer>();

    for (int i = 0; i < input.length; i++) {
        if (!resultMap.containsKey(input[i]))
            resultMap.put(input[i], input[i]);
    }

    long endTime2 = new Date().getTime();
    System.out.println("Map end Time:" + endTime2);
    System.out.println("result of Map:" + (endTime2 - startTime2));
    System.out.println("number of Map:" + resultMap.size());

Here is result:

Set start time:1441960583837
Set end time:1441960583917
result of set:80
number of Set:198652

Map start time:1441960583837
Map end Time:1441960583983
result of Map:66
number of Map:198652
查看更多
登录 后发表回答