I am writing a program to traverse a binary search tree.Here's my code:
Main.java
public class Main {
public static void main(String[] args) {
BinaryTree binaryTree = new BinaryTree();
binaryTree.add(50);
binaryTree.add(40);
binaryTree.add(39);
binaryTree.add(42);
binaryTree.add(41);
binaryTree.add(43);
binaryTree.add(55);
binaryTree.add(65);
binaryTree.add(60);
binaryTree.inOrderTraversal(binaryTree.root);
}
}
Node.java
public class Node {
int data;
Node left;
Node right;
Node parent;
public Node(int d)
{
data = d;
left = null;
right = null;
}
}
BinaryTree.java
public class BinaryTree {
Node root = null;
public void add(int d)
{
Node newNode = new Node(d);
if(root!=null)
{
Node futureParent = root;
while(true)
{
if(newNode.data < futureParent.data) //going left
{
if(futureParent.left == null)
{
futureParent.left = newNode;
newNode.parent = futureParent;
break;
}
futureParent = futureParent.left;
}
else
{
if(futureParent.right == null)
{
futureParent.right = newNode;
newNode.parent = futureParent;
break;
}
futureParent = futureParent.right;
}
}
}
else
{
root = newNode;
}
}
public void inOrderTraversal(Node node)
{
if(node!=null)
{
inOrderTraversal(node.left);
System.out.println(node.data);
inOrderTraversal(node.right);
}
}
}
I understand the addition process perfectly but I have trouble understanding the traversal. Now, the tree I am working with, for better reference is this:
The first statement in the inOrderTraversal()
function visits 50,40 then 39 and finally hits null making the if condition false after which 39 is printed and is searched for a right child.After this the first statement stops executing and the stack unwinds for the 2nd and 3rd statements(inOrderTraversal(node.right)
and print(node.data)
) which leads to printing 40 and traversing to 41 which is the part I dont understand, i.e. how does the compiler restart statement 1 (inOrderTraversal(node.left)
) after it has stopped executing as soon as there is fresh stuff in the stack.
First of all, the
while
statement ininOrderTraversal
is wrong. None of the statements in thewhile
loop modifies the variablenode
so if it wasnull
it will always be and if it wasn't, it will never be. Change it to anif
.Having that, the way to look at recursion is often by induction. I claim the following induction hypothesis:
We can show by induction that this indeed happens. The simple case is when
node == null
. In that case,inOrderTraversal
prints nothing and returns directly, which is in line with the hypothesis.Now, assume we pass a non-empty tree. By induction,
inOrderTraversal(node.left)
prints the left subtree and returns. Then,node.data
is printed. Finally, again by induction,inOrderTraversal(node.right)
prints the right subtree and returns. Note that so far the current tree was printed in in-order traversal. Since I changed thewhile
to anif
the method returns, thereby meeting the induction hypothesis.Does this answer your question?
Your code doesn't work as it is, it will iterate forever over the node 39. The method inOrderTraversal() will indeed go to the left node, but will cycle over it forever because of the while. Each stack frame has it's own copy of the variables. When entering the method, variable node gets a copy of the object reference passed as argument.
One way to think about recursion is that is similar to using a while loop, but instead of while, you have an if. Here's how the method should look like:
When you traverse the tree, you want to print first the smaller value, which is stored in left most node, so you use
inOrderTraversal(node.left);
to get to if. When you arrive at a null node, this means it's parent is the left-most node, so you print it. After that you go to the right node and repeat the process. It's like splitting the tree in smaller sub trees until you can't divide them any more and print their value.Each time you call a method (recursive or not), a new stack frame is allocated (push onto the stack) and after the method finishes, the stack is removed (pop), freeing the space for garbage collection. These stacks frames are only a temporary space where local variables live. The object member variables live in another place called the heap which has a longer life duration than the stack.
The JVM handles the allocation of these spaces and the garbage collector frees them depending on the life of objects/variables. Depending on how much they live, there are a few generations (that's what they are called). All start on the eden (young) generation and if the garbage collector doesn't reclaim the space since they are still alive, they are moved to the survivor generation, after which if they still aren't collected, they move to the last one, the tenured generation. The longer the objects live, the rarer they are checked by the GC. This means that while the objects in the eden are collected pretty fast, the rest of the generations are checked not so often. There is also another space called the permanent generation (permgen) where constants used to live (like string literals) and where classes are stored.
You can get a clearer understanding of recursion and the stack by thinking about the classic recursion example, factorial.
(I used the
result
variable to make it easier to mark position when manually stepping through the code)Run through the execution of
factorial(5)
manually, using pieces of paper.Start by writing the function on one sheet of paper, replacing 'x' with 5. Then read through it, and when you come to a function call, put a pencil mark at your execution point, and get a new sheet of paper for your new function call.
Each time you do this, put the new piece of paper on top of the previous sheet. This is, literally, a stack of paper, and it accurately represents a computer stack. Each sheet of paper is a stack entry and it records where you were in the code, and what the values of local variables were, when you create it.
It's important to understand that this isn't special to recursive function calls. All function calls create a stack entry in this way.
The program execution doesn't get to browse through the stack. Only the top sheet of paper is accessible -- last in, first out (LIFO). When you get to
factorial(1)
, it doesn't call itself again, and you come to areturn
. When that happens, discard the top sheet of paper, write the return value into the new top layer, then continue stepping through the function on the top layer, from the point where you put the pencil mark.Carry on like this, and eventually you discard the last sheet. That means your program has finished and you have a final result.
Incidentally, if there's something wrong with your code, such there's no case where the function doesn't call itself, you will run out of paper (or your stack of paper will reach the ceiling) -- that is the stack overflow after which this site is named. The stack gets bigger than an imposed maximum and the runtime refuses to invoke the function again (in Java, by throwing an exception). You're likely to encounter this in your programming career -- common causes are a badly coded stop condition, or going round and round a circular data structure.
With the implementation above,
factorial(0)
would probably cause a stack overflow. Can you see why?This is how all conventional computer programs run. You place one item on the stack (in C and Java, that's
main()
). Each time a function call is made, the stack grows, and each time a function completes, the stack shrinks. The stack grows and shrinks until it eventually shrinks to nothing, at which point the program is done.For programs like yours, with two recursive calls in the same function, nothing is different. It's a good exercise to run through a small binary tree search manually with sheets of paper in the same way as we did with
factorial()
to see it working.It's also instructive to pause your Java code in a debugger to look at the state of the current stack -- or if you can't do that (learn to use a debugger soon!) put a
Thread.dumpStack()
somewhere in your code to see what it outputs.