O(1) algorithm to determine if node is descendant

2020-06-10 04:52发布

问题:

Imagine the following tree:

    A
   / \
  B   C
 / \   \
D   E   F

I'm looking for a way to query if for example F is a descendant of A (note: F doesn't need to be a direct descendant of A), which, in this particular case would be true. Only a limited amount of potential parent nodes need to be tested against a larger potential descendants node pool.

When testing whether a node is a descendant of a node in the potential parent pool, it needs to be tested against ALL potential parent nodes.

This is what a came up with:

  • Convert multiway tree to a trie, i.e. assign the following prefixes to every node in the above tree:

     A = 1
     B = 11
     C = 12
     D = 111
     E = 112
     F = 121
    
  • Then, reserve a bit array for every possible prefix size and add the parent nodes to be tested against, i.e. if C is added to the potential parent node pool, do:

      1    2    3  <- Prefix length
    
    *[1]  [1]  ...
     [2] *[2]  ...
     [3]  [3]  ...
     [4]  [4]  ...
     ...  ...
    
  • When testing if a node is a descendant of a potential parent node, take its trie prefix, lookup the first character in the first "prefix array" (see above) and if it is present, lookup the second prefix character in the second "prefix array" and so on, i.e. testing F leads to:

     F = 1    2    1
    
       *[1]  [1]  ...
        [2] *[2]  ...
        [3]  [3]  ...
        [4]  [4]  ...
        ...  ...
    

    so yes F, is a descendant of C.

This test seems to be worst case O(n), where n = maximum prefix length = maximum tree depth, so its worst case is exactly equal to the obvious way of just going up the tree and comparing nodes. However, this performs much better if the tested node is near the bottom of the tree and the potential parent node is somewhere at the top. Combining both algorithms would mitigate both worst case scenarios. However, memory overhead is a concern.

Is there another way for doing that? Any pointers greatly appreciated!

回答1:

Are your input trees always static? If so, then you can use a Lowest Common Ancestor algorithm to answer the is descendant question in O(1) time with an O(n) time/space construction. An LCA query is given two nodes and asked which is the lowest node in the tree whose subtree contains both nodes. Then you can answer the IsDescendent query with a single LCA query, if LCA(A, B) == A or LCA(A, B) == B, then one is the descendent of the other.

This Topcoder algorithm tuorial gives a thorough discussion of the problem and a few solutions at various levels of code complexity/efficiency.



回答2:

I don't know if this would fit your problem, but one way to store hierarchies in databases, with quick "give me everything from this node and downwards" features is to store a "path".

For instance, for a tree that looks like this:

    +-- b
    |
a --+       +-- d
    |       |
    +-- c --+
            |
            +-- e

you would store the rows as follows, assuming the letter in the above tree is the "id" of each row:

id    path
a     a
b     a*b
c     a*c
d     a*c*d
e     a*c*e

To find all descendants of a particular node, you would do a "STARTSWITH" query on the path column, ie. all nodes with a path that starts with a*c*

To find out if a particular node is a descendant of another node, you would see if the longest path started with the shortest path.

So for instance:

  • e is a descendant of a since a*c*e starts with a
  • d is a descendant of c since a*c*d starts with a*c

Would that be useful in your instance?



回答3:

Traversing any tree will require "depth-of-tree" steps. Therefore if you maintain balanced tree structure it is provable that you will need O(log n) operations for your lookup operation. From what I understand your tree looks special and you can not maintain it in a balanced way, right? So O(n) will be possible. But this is bad during creation of the tree anyways, so you will probably die before you use the lookup anyway...

Depending on how often you will need that lookup operation compared to insert, you could decide to pay during insert to maintain an extra data structure. I would suggest a hashing if you really need amortized O(1). On every insert operation you put all parents of a node into a hashtable. By your description this could be O(n) items on a given insert. If you do n inserts this sounds bad (towards O(n^2)), but actually your tree can not degrade that bad, so you probably get an amortized overall hastable size of O(n log n). (actually, the log n part depends on the degration-degree of your tree. If you expect it to be maximal degraed, don't do it.)

So, you would pay about O(log n) on every insert, and get hashtable efficiency O(1) for a lookup.



回答4:

For a M-way tree, instead of your bit array, why not just store the binary "trie id" (using M bits per level) with each node? For your example (assuming M==2) : A=0b01, B=0b0101, C=0b1001, ...

Then you can do the test in O(1):

bool IsParent(node* child, node* parent)
{ 
   return ((child->id & parent->id) == parent->id)
}

You could compress the storage to ceil(lg2(M)) bits per level if you have a fast FindMSB() function which returns the position of the most significant bit set:

mask = (1<<( FindMSB(parent->id)+1) ) -1;
retunr (child->id&mask == parent->id);


回答5:

In a pre-order traversal, every set of descendants is contiguous. For your example,

A B D E C F
+---------+ A
  +---+ B
    + D
      + E
        +-+ C
          + F

If you can preprocess, then all you need to do is number each node and compute the descendant interval.

If you can't preprocess, then a link/cut tree offers O(log n) performance for both updates and queries.



回答6:

You can answer query of the form "Is node A a descendant of node B?" in constant time, by just using two auxiliary arrays.

Preprocess the tree, by visiting in Depth-First order, and for each node A store its starting and ending time in the visit in the two arrays Start[] and End[].

So, let us say that End[u] and Start[u] are respectively the ending and starting time of the visit of node u.

Then node u is a descendant of node v if and only if:

Start[v] <= Start[u] and End[u] <= End[v].

and you are done, checking this condition requires just two lookup in the arrays Start and End



回答7:

Take a look at Nested set model It's very effective to select but too slow to update