可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试):
问题:
I\'ve always loved trees, that nice O(n*log(n))
and the tidiness of them. However, every software engineer I\'ve ever known has asked me pointedly why I would use a TreeSet
. From a CS background, I don\'t think it matters all that much which you use, and I don\'t care to mess around with hash functions and buckets (in the case of Java
).
In which cases should I use a HashSet
over a TreeSet
?
回答1:
HashSet is much faster than TreeSet (constant-time versus log-time for most operations like add, remove and contains) but offers no ordering guarantees like TreeSet.
HashSet
- the class offers constant time performance for the basic operations (add, remove, contains and size).
- it does not guarantee that the order of elements will remain constant over time
- iteration performance depends on the initial capacity and the load factor of the HashSet.
- It\'s quite safe to accept default load factor but you may want to specify an initial capacity that\'s about twice the size to which you expect the set to grow.
TreeSet
- guarantees log(n) time cost for the basic operations (add, remove and contains)
- guarantees that elements of set will be sorted (ascending, natural, or the one specified by you via its constructor) (implements
SortedSet
)
- doesn\'t offer any tuning parameters for iteration performance
- offers a few handy methods to deal with the ordered set like
first()
, last()
, headSet()
, and tailSet()
etc
Important points:
- Both guarantee duplicate-free collection of elements
- It is generally faster to add elements to the HashSet and then convert the collection to a TreeSet for a duplicate-free sorted traversal.
- None of these implementations are synchronized. That is if multiple threads access a set concurrently, and at least one of the threads modifies the set, it must be synchronized externally.
- LinkedHashSet is in some sense intermediate between
HashSet
and TreeSet
. Implemented as a hash table with a linked list running through it, however,it provides insertion-ordered iteration which is not same as sorted traversal guaranteed by TreeSet.
So a choice of usage depends entirely on your needs but I feel that even if you need an ordered collection then you should still prefer HashSet to create the Set and then convert it into TreeSet.
- e.g.
SortedSet<String> s = new TreeSet<String>(hashSet);
回答2:
One advantage not yet mentioned of a TreeSet
is that its has greater \"locality\", which is shorthand for saying (1) if two entries are nearby in the order, a TreeSet
places them near each other in the data structure, and hence in memory; and (2) this placement takes advantage of the principle of locality, which says that similar data is often accessed by an application with similar frequency.
This is in contrast to a HashSet
, which spreads the entries all over memory, no matter what their keys are.
When the latency cost of reading from a hard drive is thousands of times the cost of reading from cache or RAM, and when the data really is accessed with locality, the TreeSet
can be a much better choice.
回答3:
HashSet
is O(1) to access elements, so it certainly does matter. But maintaining order of the objects in the set isn\'t possible.
TreeSet
is useful if maintaining an order(In terms of values and not the insertion order) matters to you. But, as you\'ve noted, you\'re trading order for slower time to access an element: O(log n) for basic operations.
From the javadocs for TreeSet
:
This implementation provides guaranteed log(n) time cost for the basic operations (add
, remove
and contains
).
回答4:
1.HashSet allows null object.
2.TreeSet will not allow null object. If you try to add null value it will throw a NullPointerException.
3.HashSet is much faster than TreeSet.
e.g.
TreeSet<String> ts = new TreeSet<String>();
ts.add(null); // throws NullPointerException
HashSet<String> hs = new HashSet<String>();
hs.add(null); // runs fine
回答5:
Basing on lovely visual answer on Maps by @shevchyk here is my take:
╔══════════════╦═════════════════════╦═══════════════════╦═════════════════════╗
║ Property ║ HashSet ║ TreeSet ║ LinkedHashSet ║
╠══════════════╬═════════════════════╬═══════════════════╬═════════════════════╣
║ ║ no guarantee order ║ sorted according ║ ║
║ Order ║ will remain constant║ to the natural ║ insertion-order ║
║ ║ over time ║ ordering ║ ║
╠══════════════╬═════════════════════╬═══════════════════╬═════════════════════╣
║ Add/remove ║ O(1) ║ O(log(n)) ║ O(1) ║
╠══════════════╬═════════════════════╬═══════════════════╬═════════════════════╣
║ ║ ║ NavigableSet ║ ║
║ Interfaces ║ Set ║ Set ║ Set ║
║ ║ ║ SortedSet ║ ║
╠══════════════╬═════════════════════╬═══════════════════╬═════════════════════╣
║ ║ ║ not allowed ║ ║
║ Null values ║ allowed ║ 1st element only ║ allowed ║
║ ║ ║ in Java 7 ║ ║
╠══════════════╬═════════════════════╩═══════════════════╩═════════════════════╣
║ ║ Fail-fast behavior of an iterator cannot be guaranteed ║
║ Fail-fast ║ impossible to make any hard guarantees in the presence of ║
║ behavior ║ unsynchronized concurrent modification ║
╠══════════════╬═══════════════════════════════════════════════════════════════╣
║ Is ║ ║
║ synchronized ║ implementation is not synchronized ║
╚══════════════╩═══════════════════════════════════════════════════════════════╝
回答6:
The reason why most use HashSet
is that the operations are (on average) O(1) instead of O(log n). If the set contains standard items you will not be \"messing around with hash functions\" as that has been done for you. If the set contains custom classes, you have to implement hashCode
to use HashSet
(although Effective Java shows how), but if you use a TreeSet
you have to make it Comparable
or supply a Comparator
. This can be a problem if the class does not have a particular order.
I have sometimes used TreeSet
(or actually TreeMap
) for very small sets/maps (< 10 items) although I have not checked to see if there is any real gain in doing so. For large sets the difference can be considerable.
Now if you need the sorted, then TreeSet
is appropriate, although even then if updates are frequent and the need for a sorted result is infrequent, sometimes copying the contents to a list or an array and sorting them can be faster.
回答7:
If you aren\'t inserting enough elements to result in frequent rehashings (or collisions, if your HashSet can\'t resize), a HashSet certainly gives you the benefit of constant time access. But on sets with lots of growth or shrinkage, you may actually get better performance with Treesets, depending on the implementation.
Amortized time can be close to O(1) with a functional red-black tree, if memory serves me. Okasaki\'s book would have a better explanation than I can pull off. (Or see his publication list)
回答8:
HashSet implementations are, of course, much much faster -- less overhead because there\'s no ordering. A good analysis of the various Set implementations in Java is provided at http://java.sun.com/docs/books/tutorial/collections/implementations/set.html.
The discussion there also points out an interesting \'middle ground\' approach to the Tree vs Hash question. Java provides a LinkedHashSet, which is a HashSet with an \"insertion-oriented\" linked list running through it, that is, the last element in the linked list is also the most recently inserted into the Hash. This allows you to avoid the unruliness of an unordered hash without incurring the increased cost of a TreeSet.
回答9:
The TreeSet is one of two sorted collections (the other being
TreeMap). It uses a Red-Black tree structure (but you knew that), and guarantees
that the elements will be in ascending order, according to natural order. Optionally,
you can construct a TreeSet with a constructor that lets you give the collection your
own rules for what the order should be (rather than relying on the ordering defined
by the elements\' class) by using a Comparable or Comparator
and A LinkedHashSet is an ordered version of HashSet that
maintains a doubly-linked List across all elements. Use this class instead of HashSet
when you care about the iteration order. When you iterate through a HashSet the
order is unpredictable, while a LinkedHashSet lets you iterate through the elements
in the order in which they were inserted
回答10:
A lot of answers have been given, based on technical considerations, especially around performance.
According to me, choice between TreeSet
and HashSet
matters.
But I would rather say the choice should be driven by conceptual considerations first.
If, for the objects your need to manipulate, a natural ordering does not make sense, then do not use TreeSet
.
It is a sorted set, since it implements SortedSet
. So it means you need to override function compareTo
, which should be consistent with what returns function equals
. For example if you have a set of objects of a class called Student, then I do not think a TreeSet
would make sense, since there is no natural ordering between students. You can order them by their average grade, okay, but this is not a \"natural ordering\". Function compareTo
would return 0 not only when two objects represent the same student, but also when two different students have the same grade. For the second case, equals
would return false (unless you decide to make the latter return true when two different students have the same grade, which would make equals
function have a misleading meaning, not to say a wrong meaning.)
Please note this consistency between equals
and compareTo
is optional, but strongly recommended. Otherwise the contract of interface Set
is broken, making your code misleading to other people, thus also possibly leading to unexpected behavior.
This link might be a good source of information regarding this question.
回答11:
Why have apples when you can have oranges?
Seriously guys and gals - if your collection is large, read and written to gazillions of times, and you\'re paying for CPU cycles, then the choice of the collection is relevant ONLY if you NEED it to perform better. However, in most cases, this doesn\'t really matter - a few milliseconds here and there go unnoticed in human terms. If it really mattered that much, why aren\'t you writing code in assembler or C? [cue another discussion]. So the point is if you\'re happy using whatever collection you chose, and it solves your problem [even if it\'s not specifically the best type of collection for the task] knock yourself out. The software is malleable. Optimise your code where necessary. Uncle Bob says Premature Optimisation is the root of all evil. Uncle Bob says so
回答12:
Message Edit ( complete rewrite ) When order does not matter, that\'s when. Both should give Log(n) - it would be of utility to see if either is over five percent faster than the other. HashSet can give O(1) testing in a loop should reveal whether it is.
回答13:
import java.util.HashSet;
import java.util.Set;
import java.util.TreeSet;
public class HashTreeSetCompare {
//It is generally faster to add elements to the HashSet and then
//convert the collection to a TreeSet for a duplicate-free sorted
//Traversal.
//really?
O(Hash + tree set) > O(tree set) ??
Really???? Why?
public static void main(String args[]) {
int size = 80000;
useHashThenTreeSet(size);
useTreeSetOnly(size);
}
private static void useTreeSetOnly(int size) {
System.out.println(\"useTreeSetOnly: \");
long start = System.currentTimeMillis();
Set<String> sortedSet = new TreeSet<String>();
for (int i = 0; i < size; i++) {
sortedSet.add(i + \"\");
}
//System.out.println(sortedSet);
long end = System.currentTimeMillis();
System.out.println(\"useTreeSetOnly: \" + (end - start));
}
private static void useHashThenTreeSet(int size) {
System.out.println(\"useHashThenTreeSet: \");
long start = System.currentTimeMillis();
Set<String> set = new HashSet<String>();
for (int i = 0; i < size; i++) {
set.add(i + \"\");
}
Set<String> sortedSet = new TreeSet<String>(set);
//System.out.println(sortedSet);
long end = System.currentTimeMillis();
System.out.println(\"useHashThenTreeSet: \" + (end - start));
}
}