Remove duplicates (both values) - duplicate values

2020-06-01 02:28发布

问题:

I have an ArrayList with the following strings;

 List<String> e = new ArrayList<String>();
 e.add("123");
 e.add("122");
 e.add("125");
 e.add("123");

I want to check the list for duplicates and remove them from the list. In this case my list will only have two values, and in this example it would be the values 122 and 125, and the two 123s will go away.

What will be the best way to this? I was thinking of using a Set, but that will only remove one of the duplicates.

回答1:

In Java 8 you can do:

e.removeIf(s -> Collections.frequency(e, s) > 1);

If !Java 8 you can create a HashMap<String, Integer>. If the String already appears in the map, increment its key by one, otherwise, add it to the map.

For example:

put("123", 1);

Now let's assume that you have "123" again, you should get the count of the key and add one to it:

put("123", get("aaa") + 1);

Now you can easily iterate on the map and create a new array list with keys that their values are < 2.

References:

  • ArrayList#removeIf
  • Collections#frequency
  • HashMap


回答2:

You can also use filter in Java 8

e.stream().filter(s -> Collections.frequency(e, s) == 1).collect(Collectors.toList())


回答3:

You could use a HashMap<String, Integer>.

You iterate over the list and if the Hash map does not contain the string, you add it together with a value of 1.

If, on the other hand you already have the string, you simply increment the counter. Thus, the map for your string would look like this:

{"123", 2}
{"122", 1}
{"125", 1}

You would then create a new list where the value for each key is 1.



回答4:

Here is a non-Java 8 solution using a map to count occurrences:

Map <String,Integer> map = new HashMap<String, Integer>();
for (String s : list){
    if (map.get(s) == null){
      map.put(s, 1);
    } 
    else {
      map.put(s, map.get(s) + 1);
    }
}

List<String> newList = new ArrayList<String>();

// Remove from list if there are multiples of them.
for (Map.Entry<String, String> entry : map.entrySet())
{
  if(entry.getValue() > 1){
    newList.add(entry.getKey());
  }
}

list.removeAll(newList);


回答5:

Solution in ArrayList

public static void main(String args[]) throws Exception {
      List<String> e = new ArrayList<String>();
      List<String> duplicate = new ArrayList<String>();
      e.add("123");
      e.add("122");
      e.add("125");
      e.add("123");

      for(String str : e){
          if(e.indexOf(str) != e.lastIndexOf(str)){
              duplicate.add(str);
          }
      }

      for(String str : duplicate){
          e.remove(str);              
      }

      for(String str : e){
          System.out.println(str);
      }
  }


回答6:

The simplest solutions using streams have O(n^2) time complexity. If you try them on a List with millions of entries, you'll be waiting a very, very long time. An O(n) solution is:

list = list.stream()
           .collect(Collectors.groupingBy(Function.identity(), LinkedHashMap::new, Collectors.counting()))
           .entrySet()
           .stream()
           .filter(e -> e.getValue() == 1)
           .map(Map.Entry::getKey)
           .collect(Collectors.toList());

Here, I used a LinkedHashMap to maintain the order. Note that static imports can simplify the collect part.

This is so complicated that I think using for loops is the best option for this problem.

Map<String, Integer> map = new LinkedHashMap<>();
for (String s : list)
    map.merge(s, 1, Integer::sum);
list = new ArrayList<>();
for (Map.Entry<String, Integer> e : map.entrySet())
    if (e.getValue() == 1)
        list.add(e.getKey());


回答7:

List<String> e = new ArrayList<String>();
e.add("123");
e.add("122");
e.add("125");
e.add("123");
e.add("125");
e.add("124");
List<String> sortedList = new ArrayList<String>();
for (String current : e){
    if(!sortedList.contains(current)){
        sortedList.add(current);
    }
    else{
        sortedList.remove(current);
    }
}
e.clear();
e.addAll(sortedList);


回答8:

I'm a fan of the Google Guava API. Using the Collections2 utility and a generic Predicate implementation it's possible to create a utility method to cover multiple data types.

This assumes that the Objects in question have a meaningful .equals implementation

@Test
    public void testTrimDupList() {
        Collection<String> dups = Lists.newArrayList("123", "122", "125", "123");
        dups = removeAll("123", dups);
        Assert.assertFalse(dups.contains("123"));

        Collection<Integer> dups2 = Lists.newArrayList(123, 122, 125,123);
        dups2 = removeAll(123, dups2);
        Assert.assertFalse(dups2.contains(123));
    }

    private <T> Collection<T> removeAll(final T element, Collection<T> collection) {
        return Collections2.filter(collection, new Predicate<T>(){
            @Override
            public boolean apply(T arg0) {
                return !element.equals(arg0);
            }});
    }

Thinking about this a bit more

Most of the other examples in this page are using the java.util.List API as the base Collection. I'm not sure if that is done with intent, but if the returned element has to be a List, another intermediary method can be used as specified below. Polymorphism ftw!

@Test
    public void testTrimDupListAsCollection() {
        Collection<String> dups = Lists.newArrayList("123", "122", "125", "123");
        //List used here only to get access to the .contains method for validating behavior.
        dups = Lists.newArrayList(removeAll("123", dups)); 
        Assert.assertFalse(dups.contains("123"));

        Collection<Integer> dups2 = Lists.newArrayList(123, 122, 125,123);
      //List used here only to get access to the .contains method for validating behavior.
        dups2 = Lists.newArrayList(removeAll(123, dups2));
        Assert.assertFalse(dups2.contains(123));
    }

    @Test
    public void testTrimDupListAsList() {
        List<String> dups = Lists.newArrayList("123", "122", "125", "123");
        dups = removeAll("123", dups);
        Assert.assertFalse(dups.contains("123"));

        List<Integer> dups2 = Lists.newArrayList(123, 122, 125,123);
        dups2 = removeAll(123, dups2);
        Assert.assertFalse(dups2.contains(123));
    }

    private <T> List<T> removeAll(final T element, List<T> collection) {
        return Lists.newArrayList(removeAll(element, (Collection<T>) collection));

    }
    private <T> Collection<T> removeAll(final T element, Collection<T> collection) {
        return Collections2.filter(collection, new Predicate<T>(){
            @Override
            public boolean apply(T arg0) {
                return !element.equals(arg0);
            }});
    }


回答9:

Something like this (using a Set):

Set<Object> blackList = new Set<>()

public void add(Object object) {
    if (blackList.exists(object)) {
        return;
    }
    boolean notExists = set.add(object);
    if (!notExists) {
       set.remove(object)
       blackList.add(object);
    }
}


回答10:

If you are going for set then you can achieve it with two sets. Maintain duplicate values in the other set as follows:

List<String> duplicateList = new ArrayList<String>();

duplicateList.add("123");
duplicateList.add("122");
duplicateList.add("125");
duplicateList.add("123");
duplicateList.add("127");
duplicateList.add("127");

System.out.println(duplicateList);

Set<String> nonDuplicateList = new TreeSet<String>();
Set<String> duplicateValues = new TreeSet<String>();

if(nonDuplicateList.size()<duplicateList.size()){
    for(String s: duplicateList){
        if(!nonDuplicateList.add(s)){
            duplicateValues.add(s);
        }
    }

    duplicateList.removeAll(duplicateValues);

    System.out.println(duplicateList);
    System.out.println(duplicateValues);
}

Output: Original list: [123, 122, 125, 123, 127, 127]. After removing
duplicate: [122, 125] values which are duplicates: [123, 127]


Note: This solution might not be optimized. You might find a better
solution than this.



回答11:

With the Guava library, using a multiset and streams:

e = HashMultiset.create(e).entrySet().stream()
    .filter(me -> me.getCount() > 1)
    .map(me -> me.getElement())
    .collect(toList());

This is pretty, and reasonably fast for large lists (O(n) with a rather large constant factor). But it does not preserve order (LinkedHashMultiset can be used if that is desired) and it creates a new list instance.

It is also easy to generalise, to instead remove all triplicates for example.

In general the multiset data structure is really useful to keep in ones toolbox.