Finding last element of a binary heap

2019-01-17 16:43发布

问题:

quoting Wikipedia:

It is perfectly acceptable to use a traditional binary tree data structure to implement a binary heap. There is an issue with finding the adjacent element on the last level on the binary heap when adding an element which can be resolved algorithmically...

Any ideas on how such an algorithm might work?

I was not able to find any information about this issue, for most binary heaps are implemented using arrays.

Any help appreciated.


Recently, I have registered an OpenID account and am not able to edit my initial post nor comment answers. That's why I am responding via this answer. Sorry for this.


quoting Mitch Wheat:

@Yse: is your question "How do I find the last element of a binary heap"?

Yes, it is. Or to be more precise, my question is: "How do I find the last element of a non-array-based binary heap?".

quoting Suppressingfire:

Is there some context in which you're asking this question? (i.e., is there some concrete problem you're trying to solve?)

As stated above, I would like to know a good way to "find the last element of a non-array-based binary heap" which is necessary for insertion and deletion of nodes.

quoting Roy:

It seems most understandable to me to just use a normal binary tree structure (using a pRoot and Node defined as [data, pLeftChild, pRightChild]) and add two additional pointers (pInsertionNode and pLastNode). pInsertionNode and pLastNode will both be updated during the insertion and deletion subroutines to keep them current when the data within the structure changes. This gives O(1) access to both insertion point and last node of the structure.

Yes, this should work. If I am not mistaken, it could be a little bit tricky to find the insertion node and the last node, when their locations change to another subtree due to an deletion/insertion. But I'll give this a try.

quoting Zach Scrivena:

How about performing a depth-first search...

Yes, this would be a good approach. I'll try that out, too.

Still I am wondering, if there is a way to "calculate" the locations of the last node and the insertion point. The height of a binary heap with N nodes can be calculated by taking the log (of base 2) of the smallest power of two that is larger than N. Perhaps it is possible to calculate the number of nodes on the deepest level, too. Then it was maybe possible to determine how the heap has to be traversed to reach the insertion point or the node for deletion.

回答1:

Basically, the statement quoted refers to the problem of resolving the location for insertion and deletion of data elements into and from the heap. In order to maintain "the shape property" of a binary heap, the lowest level of the heap must always be filled from left to right leaving no empty nodes. To maintain the average O(1) insertion and deletion times for the binary heap, you must be able to determine the location for the next insertion and the location of the last node on the lowest level to use for deletion of the root node, both in constant time.

For a binary heap stored in an array (with its implicit, compacted data structure as explained in the Wikipedia entry), this is easy. Just insert the newest data member at the end of the array and then "bubble" it into position (following the heap rules). Or replace the root with the last element in the array "bubbling down" for deletions. For heaps in array storage, the number of elements in the heap is an implicit pointer to where the next data element is to be inserted and where to find the last element to use for deletion.

For a binary heap stored in a tree structure, this information is not as obvious, but because it's a complete binary tree, it can be calculated. For example, in a complete binary tree with 4 elements, the point of insertion will always be the right child of the left child of the root node. The node to use for deletion will always be the left child of the left child of the root node. And for any given arbitrary tree size, the tree will always have a specific shape with well defined insertion and deletion points. Because the tree is a "complete binary tree" with a specific structure for any given size, it is very possible to calculate the location of insertion/deletion in O(1) time. However, the catch is that even when you know where it is structurally, you have no idea where the node will be in memory. So, you have to traverse the tree to get to the given node which is an O(log n) process making all inserts and deletions a minimum of O(log n), breaking the usually desired O(1) behavior. Any search ("depth-first", or some other) will be at least O(log n) as well because of the traversal issue noted and usually O(n) because of the random nature of the semi-sorted heap.

The trick is to be able to both calculate and reference those insertion/deletion points in constant time either by augmenting the data structure ("threading" the tree, as mention in the Wikipedia article) or using additional pointers.

The implementation which seems to me to be the easiest to understand, with low memory and extra coding overhead, is to just use a normal simple binary tree structure (using a pRoot and Node defined as [data, pParent, pLeftChild, pRightChild]) and add two additional pointers (pInsert and pLastNode). pInsert and pLastNode will both be updated during the insertion and deletion subroutines to keep them current when the data within the structure changes. This implementation gives O(1) access to both insertion point and last node of the structure and should allow preservation of overall O(1) behavior in both insertion and deletions. The cost of the implementation is two extra pointers and some minor extra code in the insertion/deletion subroutines (aka, minimal).

EDIT: added pseudocode for an O(1) insert()

Here is pseudo code for an insert subroutine which is O(1), on average:

define Node = [T data, *pParent, *pLeft, *pRight]

void insert(T data)
{
    do_insertion( data );   // do insertion, update count of data items in tree

    # assume: pInsert points node location of the tree that where insertion just took place
    #   (aka, either shuffle only data during the insertion or keep pInsert updated during the bubble process)

    int N = this->CountOfDataItems + 1;     # note: CountOfDataItems will always be > 0 (and pRoot != null) after an insertion

    p = new Node( <null>, null, null, null);        // new empty node for the next insertion

    # update pInsert (three cases to handle)
    if ( int(log2(N)) == log2(N) )
        {# #1 - N is an exact power of two
        # O(log2(N))
        # tree is currently a full complete binary tree ("perfect")
        # ... must start a new lower level
        # traverse from pRoot down tree thru each pLeft until empty pLeft is found for insertion
        pInsert = pRoot;
        while (pInsert->pLeft != null) { pInsert = pInsert->pLeft; }    # log2(N) iterations
        p->pParent = pInsert;
        pInsert->pLeft = p;
        }
    else if ( isEven(N) )
        {# #2 - N is even (and NOT a power of 2)
        # O(1)
        p->pParent = pInsert->pParent;
        pInsert->pParent->pRight = p;
        }
    else 
        {# #3 - N is odd
        # O(1)
        p->pParent = pInsert->pParent->pParent->pRight;
        pInsert->pParent->pParent->pRight->pLeft = p;
        }
    pInsert = p;

    // update pLastNode
    // ... [similar process]
}

So, insert(T) is O(1) on average: exactly O(1) in all cases except when the tree must be increased by one level when it is O(log N), which happens every log N insertions (assuming no deletions). The addition of another pointer (pLeftmostLeaf) could make insert() O(1) for all cases and avoids the possible pathologic case of alternating insertion & deletion in a full complete binary tree. (Adding pLeftmost is left as an exercise [it's fairly easy].)



回答2:

My first time to participate in stack overflow.

Yes, the above answer by Zach Scrivena (god I don't know how to properly refer to other people, sorry) is right. What I want to add is a simplified way if we are given the count of nodes.

The basic idea is:

Given the count N of nodes in this full binary tree, do "N % 2" calculation and push the results into a stack. Continue the calculation until N == 1. Then pop the results out. The result being 1 means right, 0 means left. The sequence is the route from root to target position.

Example:

The tree now have 10 nodes, I want insert another node at position 11. How to route it?

11 % 2 = 1  --> right    (the quotient is 5, and push right into stack)
 5 % 2 = 1  --> right    (the quotient is 2, and push right into stack)
 2 % 2 = 0  --> left     (the quotient is 1, and push left into stack. End)

Then pop the stack: left -> right -> right. This is the path from the root.



回答3:

You could use the binary representation of the size of the Binary Heap to find the location of the last node in O(log N). The size could be stored and incremented which would take O(1) time. The the fundamental concept behind this is the structure of the binary tree.

Suppose our heap size is 7. The binary representation of 7 is, "111". Now, remember to always omit the first bit. So, now we are left with "11". Read from left-to-right. The bit is '1', so, go to the right child of the root node. Then the string left is "1", the first bit is '1'. So, again go to the right child of the current node you are at. As you no longer have bits to process, this indicates that you have reached the last node. So, the raw working of the process is that, convert the size of the heap into bits. Omit the first bit. According to the leftmost bit, go to the right child of the current node if it is '1', and to the left child of the current node if it is '0'.

As you always to to the very end of the binary tree this operation always takes O(log N) time. This is a simple and accurate procedure to find the last node.

You may not understand it in the first reading. Try working this method on the paper for different values of Binary Heap, I'm sure you'll get the intuition behind it. I'm sure this knowledge is enough to solve your problem, if you want more explanation with figures, you can refer to my blog.

Hope my answer has helped you, if it did, let me know...! ☺



回答4:

How about performing a depth-first search, visiting the left child before the right child, to determine the height of the tree. Thereafter, the first leaf you encounter with a shorter depth, or a parent with a missing child would indicate where you should place the new node before "bubbling up".


The depth-first search (DFS) approach above doesn't assume that you know the total number of nodes in the tree. If this information is available, then we can "zoom-in" quickly to the desired place, by making use of the properties of complete binary trees:

Let N be the total number of nodes in the tree, and H be the height of the tree.

Some values of (N,H) are (1,0), (2,1), (3,1), (4,2), ..., (7,2), (8, 3). The general formula relating the two is H = ceil[log2(N+1)] - 1. Now, given only N, we want to traverse from the root to the position for the new node, in the least number of steps, i.e. without any "backtracking". We first compute the total number of nodes M in a perfect binary tree of height H = ceil[log2(N+1)] - 1, which is M = 2^(H+1) - 1.

If N == M, then our tree is perfect, and the new node should be added in a new level. This means that we can simply perform a DFS (left before right) until we hit the first leaf; the new node becomes the left child of this leaf. End of story.

However, if N < M, then there are still vacancies in the last level of our tree, and the new node should be added to the leftmost vacant spot. The number of nodes that are already at the last level of our tree is just (N - 2^H + 1). This means that the new node takes spot X = (N - 2^H + 2) from the left, at the last level.

Now, to get there from the root, you will need to make the correct turns (L vs R) at each level so that you end up at spot X at the last level. In practice, you would determine the turns with a little computation at each level. However, I think the following table shows the big picture and the relevant patterns without getting mired in the arithmetic (you may recognize this as a form of arithmetic coding for a uniform distribution):

0 0 0 0 0 X 0 0 <--- represents the last level in our tree, X marks the spot!
          ^
L L L L R R R R <--- at level 0, proceed to the R child
L L R R L L R R <--- at level 1, proceed to the L child
L R L R L R L R <--- at level 2, proceed to the R child 
          ^                      (which is the position of the new node)
          this column tells us
          if we should proceed to the L or R child at each level

EDIT: Added a description on how to get to the new node in the shortest number of steps assuming that we know the total number of nodes in the tree.



回答5:

Solution in case you don't have reference to parent !!! To find the right place for next node you have 3 cases to handle

  • case (1) Tree level is complete Log2(N)
  • case (2) Tree node count is even
  • case (3) Tree node count is odd

Insert:

void Insert(Node root,Node n)
{
Node parent = findRequiredParentToInsertNewNode (root);
if(parent.left == null)
parent.left = n;
else
parent.right = n;
}

Find the parent of the node in order to insert it

void findRequiredParentToInsertNewNode(Node root){

Node last = findLastNode(root);

 //Case 1 
 if(2*Math.Pow(levelNumber) == NodeCount){
     while(root.left != null)
      root=root.left;
  return root;
 }
 //Case 2
 else if(Even(N)){
   Node n =findParentOfLastNode(root ,findParentOfLastNode(root ,last));
 return n.right;
 }
//Case 3
 else if(Odd(N)){
  Node n =findParentOfLastNode(root ,last);
 return n;
 }

}

To find the last node you need to perform a BFS (breadth first search) and get the last element in the queue

 Node findLastNode(Node root)
 {
     if (root.left == nil)
         return root

     Queue q = new Queue();

     q.enqueue(root);
     Node n = null;

     while(!q.isEmpty()){
      n = q.dequeue();
     if ( n.left != null )
         q.enqueue(n.left);
     if ( n.right != null )
         q.enqueue(n.right);
        }
   return n;
}

Find the parent of the last node in order to set the node to null in case replacing with the root in removal case

 Node findParentOfLastNode(Node root ,Node lastNode)
{
    if(root == null)
        return root;

    if( root.left == lastNode || root.right == lastNode )
        return root;

    Node n1= findParentOfLastNode(root.left,lastNode);
    Node n2= findParentOfLastNode(root.left,lastNode);

    return n1 != null ? n1 : n2;
}


回答6:

I know this is an old thread but i was looking for a answer to the same question. But i could not afford to do an o(log n) solution as i had to find the last node thousands of times in a few seconds. I did have a O(log n) algorithm but my program was crawling because of the number of times it performed this operation. So after much thought I did finally find a fix for this. Not sure if anybody things this is interesting.

This solution is O(1) for search. For insertion it is definitely less than O(log n), although I cannot say it is O(1).

Just wanted to add that if there is interest, i can provide my solution as well. The solution is to add the nodes in the binary heap to a queue. Every queue node has front and back pointers.We keep adding nodes to the end of this queue from left to right until we reach the last node in the binary heap. At this point, the last node in the binary heap will be in the rear of the queue. Every time we need to find the last node, we dequeue from the rear,and the second-to-last now becomes the last node in the tree. When we want to insert, we search backwards from the rear for the first node where we can insert and put it there. It is not exactly O(1) but reduces the running time dramatically.