HashSet vs TreeSet vs LinkedHashSet on basis of ad

2019-02-02 05:42发布

问题:

I am learning heart of core java i.e. Collections.I would like to know that what happens internally when we add duplicate element in HashSet, TreeSet, LinkedHashSet.

Weather entry is replaced, ignored or exception is thrown and program terminates. And one sub question is, Which one has same or average time complexity for all its operations

Your response will be greatly appreciated.

回答1:

TreeSet, LinkedHashSet and HashSet in Java are three Set implementation in collection framework and like many others they are also used to store objects. Main feature of TreeSet is sorting, LinkedHashSet is insertion order and HashSet is just general purpose collection for storing object. HashSet is implemented using HashMap in Java while TreeSet is implemented using TreeMap. TreeSet is a SortedSet implementation which allows it to keep elements in the sorted order defined by either Comparable or Comparator interface. Comparable is used for natural order sorting and Comparator for custom order sorting of objects, which can be provided while creating instance of TreeSet. Anyway before seeing difference between TreeSet, LinkedHashSet and HashSet, let's see some similarities between them:

1) Duplicates : All three implements Set interface means they are not allowed to store duplicates.

2) Thread safety : HashSet, TreeSet and LinkedHashSet are not thread-safe, if you use them in multi-threading environment where at least one Thread modifies Set you need to externally synchronize them.

3) Fail-Fast Iterator : Iterator returned by TreeSet, LinkedHashSet and HashSet are fail-fast Iterator. i.e. If Iterator is modified after its creation by any way other than Iterators remove() method, it will throw ConcurrentModificationException with best of effort. read more about fail-fast vs fail-safe Iterator here

Now let’s see difference between HashSet, LinkedHashSet and TreeSet in Java :

Performance and Speed : First difference between them comes in terms of speed. HashSet is fastest, LinkedHashSet is second on performance or almost similar to HashSet but TreeSet is bit slower because of sorting operation it needs to perform on each insertion. TreeSet provides guaranteed O(log(n)) time for common operations like add, remove and contains, while HashSet and LinkedHashSet offer constant time performance e.g. O(1) for add, contains and remove given hash function uniformly distribute elements in bucket.

Ordering : HashSet does not maintain any order while LinkedHashSet maintains insertion order of elements much like List interface and TreeSet maintains sorting order or elements.

Internal Implementation : HashSet is backed by an HashMap instance, LinkedHashSet is implemented using HashSet and LinkedList while TreeSet is backed up by NavigableMap in Java and by default it uses TreeMap.

null : Both HashSet and LinkedHashSet allows null but TreeSet doesn't allow null and throw java.lang.NullPointerException when you will insert null into TreeSet. Since TreeSet uses compareTo() method of respective elements to compare them which throws NullPointerException while comparing with null, here is an example:

TreeSet cities
Exception in thread "main" java.lang.NullPointerException
        at java.lang.String.compareTo(String.java:1167)
        at java.lang.String.compareTo(String.java:92)
        at java.util.TreeMap.put(TreeMap.java:545)
        at java.util.TreeSet.add(TreeSet.java:238)

Comparison : HashSet and LinkedHashSet uses equals() method in Java for comparison but TreeSet uses compareTo() method for maintaining ordering. That's why compareTo() should be consistent to equals in Java. failing to do so break general contact of Set interface i.e. it can permit duplicates.

Use can use below link to see internal implementation http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/6-b14/java/util/HashSet.java#HashSet.add%28java.lang.Object%29

From the source code 
Hashset hases Hashmap to store the data and LinkedHashSet extends Hashset and hence uses same add method of Hashset But TreeSet uses NavigableMap to store the data

Source: http://javarevisited.blogspot.com/2012/11/difference-between-treeset-hashset-vs-linkedhashset-java.html#ixzz2lGo6Y9mm



回答2:

This image may help you...

Image Source : http://javaconceptoftheday.com/hashset-vs-linkedhashset-vs-treeset-in-java/



回答3:

I haven't found much hard data on the differences, so I ran a benchmark for the 3 cases.

It appears that HashSet is about 4 times faster than TreeSet when adding (under certain circumstances, this will probably vary according to the exact characteristics of your data etc.).

# Run complete. Total time: 00:22:47

Benchmark                                                     Mode  Cnt  Score   Error  Units
DeduplicationWithSetsBenchmark.deduplicateWithHashSet        thrpt  200  7.734 ▒ 0.133  ops/s
DeduplicationWithSetsBenchmark.deduplicateWithLinkedHashSet  thrpt  200  7.100 ▒ 0.171  ops/s
DeduplicationWithSetsBenchmark.deduplicateWithTreeSet        thrpt  200  1.983 ▒ 0.032  ops/s

Here is the benchmark code:

package my.app;

import org.openjdk.jmh.annotations.Benchmark;
import org.openjdk.jmh.runner.Runner;
import org.openjdk.jmh.runner.RunnerException;
import org.openjdk.jmh.runner.options.Options;
import org.openjdk.jmh.runner.options.OptionsBuilder;

import java.util.Comparator;
import java.util.HashSet;
import java.util.LinkedHashSet;
import java.util.Random;
import java.util.Set;
import java.util.TreeSet;

public class DeduplicationWithSetsBenchmark {

    static Item[] inputData = makeInputData();

    @Benchmark
    public int deduplicateWithHashSet() {
        return deduplicate(new HashSet<>());
    }

    @Benchmark
    public int deduplicateWithLinkedHashSet() {
        return deduplicate(new LinkedHashSet<>());
    }

    @Benchmark
    public int deduplicateWithTreeSet() {
        return deduplicate(new TreeSet<>(Item.comparator()));
    }

    private int deduplicate(Set<Item> set) {
        for (Item i : inputData) {
            set.add(i);
        }
        return set.size();
    }

    public static void main(String[] args) throws RunnerException {

        // Verify that all 3 methods give the same answers:
        DeduplicationWithSetsBenchmark x = new DeduplicationWithSetsBenchmark();
        int count = x.deduplicateWithHashSet();
        assert(count < inputData.length);
        assert(count == x.deduplicateWithLinkedHashSet());
        assert(count == x.deduplicateWithTreeSet());


        Options opt = new OptionsBuilder()
            .include(DeduplicationWithSetsBenchmark.class.getSimpleName())
            .forks(1)
            .build();

        new Runner(opt).run();
    }

    private static Item[] makeInputData() {
        int count = 1000000;
        Item[] acc = new Item[count];
        Random rnd = new Random();

        for (int i=0; i<count; i++) {
            Item item = new Item();
            // We are looking to include a few collisions, so restrict the space of the values
            item.name = "the item name " + rnd.nextInt(100);
            item.id = rnd.nextInt(100);
            acc[i] = item;
        }
        return acc;
    }

    private static class Item {
        public String name;
        public int id;

        public String getName() {
            return name;
        }

        public int getId() {
            return id;
        }

        @Override
        public boolean equals(Object obj) {
            Item other = (Item) obj;

            return name.equals(other.name) && id == other.id;
        }

        @Override
        public int hashCode() {
            return name.hashCode() * 13 + id;
        }

        static Comparator<Item> comparator() {
            return Comparator.comparing(Item::getName, Comparator.naturalOrder())
                .thenComparing(Item::getId, Comparator.naturalOrder());
        }
    }
}


回答4:

tldr: Repeat values are ignored by these collections.

I haven't seen a complete answer to the bolded part of the question, what EXACTLY happens to duplicates? Does it overwrite the old object or ignore the new one? Consider this example of an object where one field determines equality but there is extra data that could vary:

public class MyData implements Comparable {
    public final Integer valueDeterminingEquality;
    public final String extraData;

    public MyData(Integer valueDeterminingEquality, String extraData) {
        this.valueDeterminingEquality = valueDeterminingEquality;
        this.extraData = extraData;
    }

    @Override
    public boolean equals(Object o) {
        return valueDeterminingEquality.equals(((MyData) o).valueDeterminingEquality);
    }

    @Override
    public int hashCode() {
        return valueDeterminingEquality.hashCode();
    }

    @Override
    public int compareTo(Object o) {
        return valueDeterminingEquality.compareTo(((MyData)o).valueDeterminingEquality);
    }
}

This unit test shows that duplicate values are ignored by all 3 collections:

import org.junit.Test;
import org.junit.runner.RunWith;
import org.junit.runners.Parameterized;

import java.util.*;

import static org.hamcrest.CoreMatchers.is;
import static org.hamcrest.MatcherAssert.assertThat;

@RunWith(Parameterized.class)
public class SetRepeatedItemTest {
    private final Set<MyData> testSet;

    public SetRepeatedItemTest(Set<MyData> testSet) {
        this.testSet = testSet;
    }

    @Parameterized.Parameters
    public static Collection<Object[]> data() {
        return Arrays.asList(new Object[][] {
                { new TreeSet() }, { new HashSet() }, { new LinkedHashSet()}
        });
    }

    @Test
    public void testTreeSet() throws Exception {
        testSet.add(new MyData(1, "object1"));
        testSet.add(new MyData(1, "object2"));
        assertThat(testSet.size(), is(1));
        assertThat(testSet.iterator().next().extraData, is("object1"));
    }
}

I also looked into the implementation of TreeSet, which we know uses TreeMap... In TreeSet.java:

public boolean add(E var1) {
    return this.m.put(var1, PRESENT) == null;
}

Instead of showing TreeMap's entire put method, here's the relevant search loop:

parent = t;
cmp = k.compareTo(t.key);
if (cmp < 0)
        t = t.left;
else if (cmp > 0)
        t = t.right;
else
    return t.setValue(value);
} while (t != null);

so if cmp == 0, ie we've found a duplicate entry, we return early instead of adding a child at the end of the loop. The call to setValue doesn't actually do anything, because TreeSet is using dummy data for the value here, the important thing is that the key doesn't change. If you look into HashMap you'll see the same behavior.