Change binary search tree to balance

2019-09-21 07:12发布

问题:

I have just learnt how to create a binary search data structure, which is going to be used to store thousands of words from a dictionary. The problem that I am getting is that it is taking a long time to count add and remove data. Usually 199263ms or 200 seconds for 100000 words to count. I was told that having a tree that can self balance will improve the efficiency and make the operations faster.

My question is how can I make my tree auto balance so to make it efficient. I have made slight improvements by eliminating duplicate words to make the height of the tree to be shorter.

If someone can give me advice on how i can make the tree efficient and how I can implement balancing tree in java will be helpful.

回答1:

You should look into red/black trees, which are self balancing. The nodes store a color in addition to an element, and every time the tree is modified, you rebalance the tree so that it meets the properties of a red/black tree:

(From Wikipedia:)

  1. Each node is either red or black.

  2. The root is black.

  3. All leaves (NIL) are black.

  4. If a node is red, then both its children are black.

  5. Every path from a given node to any of its descendant NIL nodes contains the same number of black nodes.

To get started implementing a red black tree, I recommend looking at this example implementation on github, and reading this explanation of red black trees.



回答2:

To balance a binary tree, it may be easier to just construct a new one, adding elements in a better order

BinaryTree balance(BinaryTree tree)
{
    BinaryTree out = new BinaryTree();
    String[] values = tree.toArray(); //a sorted array
    for(int i = Integer.highestOneBit(values.length); i > 0; i >>= 1)
        for(int j = i; j <= values.length; j += i)
            out.add(values[j - 1]);
    return out;
}

By extension of this, if the words read in do not need to be put into a tree and sorted right away, Arrays.sort(Object[]) is likely to be faster

List<String> wordList = new LinkedList<String>();
BufferedReader reader = [...];
for(String line = reader.readLine(); line != null; line = reader.readLine())
    wordList.add(line);
String[] words = wordList.toArray(new String[0]);
Arrays.sort(words);
BinaryTree tree = new BinaryTree();
for(int i = Integer.highestOneBit(words.length); i > 0; i >>= 1)
    for(int j = i; j <= words.length; j += i)
        out.add(words[j - 1]);

Depending on what you're actually using this data for (just a lookup table?) it may be faster to use a HashSet instead

Set<String> dict = new HashSet<String>();
BufferedReader reader = [...];
for(String line = reader.readLine(); line != null; line = reader.readLine())
    dict.add(line);