I have just learnt how to create a binary search data structure, which is going to be used to store thousands of words from a dictionary. The problem that I am getting is that it is taking a long time to count add and remove data. Usually 199263ms or 200 seconds for 100000 words to count. I was told that having a tree that can self balance will improve the efficiency and make the operations faster.
My question is how can I make my tree auto balance so to make it efficient. I have made slight improvements by eliminating duplicate words to make the height of the tree to be shorter.
If someone can give me advice on how i can make the tree efficient and how I can implement balancing tree in java will be helpful.
You should look into red/black trees, which are self balancing. The nodes store a color in addition to an element, and every time the tree is modified, you rebalance the tree so that it meets the properties of a red/black tree:
(From Wikipedia:)
Each node is either red or black.
The root is black.
All leaves (NIL) are black.
If a node is red, then both its children are black.
Every path from a given node to any of its descendant NIL nodes
contains the same number of black nodes.
To get started implementing a red black tree, I recommend looking at this example implementation on github, and reading this explanation of red black trees.
To balance a binary tree, it may be easier to just construct a new one, adding elements in a better order
BinaryTree balance(BinaryTree tree)
{
BinaryTree out = new BinaryTree();
String[] values = tree.toArray(); //a sorted array
for(int i = Integer.highestOneBit(values.length); i > 0; i >>= 1)
for(int j = i; j <= values.length; j += i)
out.add(values[j - 1]);
return out;
}
By extension of this, if the words read in do not need to be put into a tree and sorted right away, Arrays.sort(Object[])
is likely to be faster
List<String> wordList = new LinkedList<String>();
BufferedReader reader = [...];
for(String line = reader.readLine(); line != null; line = reader.readLine())
wordList.add(line);
String[] words = wordList.toArray(new String[0]);
Arrays.sort(words);
BinaryTree tree = new BinaryTree();
for(int i = Integer.highestOneBit(words.length); i > 0; i >>= 1)
for(int j = i; j <= words.length; j += i)
out.add(words[j - 1]);
Depending on what you're actually using this data for (just a lookup table?) it may be faster to use a HashSet
instead
Set<String> dict = new HashSet<String>();
BufferedReader reader = [...];
for(String line = reader.readLine(); line != null; line = reader.readLine())
dict.add(line);