ArrayList and HashSet insert performance test resu

2020-07-30 00:37发布

问题:

i wirte a class to test the insert performance between arraylist and hashset,as i expect ,the hashset insert performance will be much better than arraylist(maybe the book deceived me),but the test result make me so confused

    HashSet<String> hashSet = new HashSet<String>();

    long start = System.currentTimeMillis();
    for (int i = 0; i < 900000; i++) {
        hashSet.add(String.valueOf(i));
    }

    System.out.println("Insert HashSet Time: " + (System.currentTimeMillis() - start));


    ArrayList<String> arrayList = new ArrayList<String>();

    start = System.currentTimeMillis();

    for (int i = 0; i < 900000; i++) {
        arrayList.add(String.valueOf(i));
    }
    System.out.println("Insert ArrayList Time: " + (System.currentTimeMillis() - start));

result:
Insert HashSet Time: 978
Insert ArrayList Time: 287

i run this main metod many times and the result has no more different between this,the insert arraylist time is much shorter than insert hashset time can anybody explain this weird result.

回答1:

Exact performance characteristics of datastructures and algorithms are highly machine- and implementation-specific. However, it doesn't seem surprising to me that ArrayList inserts would be faster than HashSet inserts by a constant factor. To insert into an ArrayList, you just need to set a value at a particular index in an array. To insert into a hash set, you need to compute a hashcode for the inserted item and map that to an array index, check that index and possibly perform some action based on what you find, and finally insert into the array. Furthermore the HashSet will have worse memory locality so you'll get cache misses more often.

There's also the question of array resizing, which both data structures will need to do, but both data structures will need to resize at about the same rate (and hash table resizing is probably more expensive by a constant factor, too, due to rehashing).

Both algorithms are constant (expected) time, but there's a lot more stuff to do with a hash table than an array list. So it's not surprising that it would be slower by a constant factor. (Again, the exact difference is highly dependent on machine and implementation.)



回答2:

The hashset and list are different types of datastructures. So you should think about what you do want to do with them before choosing one.

HashSet

Longer insert time

Fast access time on elements

List

Fast append time

Long access time on elements

The list is faster because it can just add the element at the end of the list, the hashset has to find where to insert and then make the element accessable, this is more work (time) as adding it to the end of a list.



回答3:

the hashset insert performance will be much better than arraylist

Where did you get that idea?
HashSet will outperform ArrayList on search i.e: get().
But on insert they have comparable performance. Actually ArrayList is even faster if you are within array limits (no resize needed) and the hash function is not good



回答4:

Actually, you are getting the right results. Also, as pointed out in the above answer, these are different types of data-structures. Comparing them would be like comparing the speed of a bike with a car. I think the time for inserting in a HashSet must be more than that of insertion in an ArrayList because HashSet doesn't allow duplicate keys. So I assume that before insertion there must be some sort of checking for duplicate keys before insertion and how to handle them which makes them somewhat slower as compared to ArrayList.



回答5:

HashSet is backed by hashtable. If you know about hashtable, you would know that there is a hash function. also collision handling(if there was collision) when you add new element in it. Well hashSet doesn't handle collision, just overwrite the old value if hash are same. However if capacity reached, it need to resize, and possible re-hash. it would be very slow.

ArrayList just append the object to the end of the list. if size reached, it does resize.