Tarjan's Algorithm: Time Complexity and slight

2019-02-28 01:04发布

问题:

This question is related to but not the same as the one recently asked here.

I just read the Wikipedia psuedocode.

algorithm tarjan is
  input: graph G = (V, E)
  output: set of strongly connected components (sets of vertices)

  index := 0
  S := empty
  for each v in V do
    if (v.index is undefined) then
      strongconnect(v)
    end if
  end for

  function strongconnect(v)
    // Set the depth index for v to the smallest unused index
    v.index := index
    v.lowlink := index
    index := index + 1
    S.push(v)

    // Consider successors of v
    for each (v, w) in E do
      if (w.index is undefined) then
        // Successor w has not yet been visited; recurse on it
        strongconnect(w)
        v.lowlink  := min(v.lowlink, w.lowlink)
      else if (w is in S) then
        // Successor w is in stack S and hence in the current SCC
        v.lowlink  := min(v.lowlink, w.index)
      end if
    end for

    // If v is a root node, pop the stack and generate an SCC
    if (v.lowlink = v.index) then
      start a new strongly connected component
      repeat
        w := S.pop()
        add w to current strongly connected component
      until (w = v)
      output the current strongly connected component
    end if
  end function

I obviously mustn't have understood it properly since I have two very basic questions:

  1. When we say if (w is in S), then isn't this operation of O(N) or atleast O(logN) complexity because the elements shall be ordered by their indices? We will have to perform this for each new node accessible from the root node, so isn't the overall complexity O(NlogN). Moreover S is a stack so conceptually only the top element should be accessible, how do we implement search in it? Shouldn't a binary search tree be a better data structure here?

  2. In this part:

    else if (w is in S) then
    v.lowlink := min(v.lowlink, w.index)

Is there a specific reason for using w.index and not w.lowlink? The advantage for using w.lowlink would be that it would answer the previous question (the linked question). The LLs for all the nodes in an SCC would guaranteed be the same for all nodes.

回答1:

1) Your first question: It can easily been done in O(1) , just maintain a boolean array inStack, the moment node n is put in the stack, flag inStack[n] to true. When you pop it off the stack, flag it back to false.

2) Not much different between w.index and w.lowlink , but this is easier to read, as we will understand that this condition is to check for the case from Node A ->B ->C ->A, which checking when node C can reach a predecessor node A or not. Keep in mind that at the moment we updating C, node A lowlink is not updated properly yet.

The Tarjan algorithm is based on the fact that a node will be the root of a SCC if and only if from that node, we cannot reach any predecessor node (which means it has the lowest lowlink in its SCC and also equals to this node's index). So the condition is just implemented this idea in the most straight forward way, if we encounter a node that is already visited, we check whether this node is a predecessor of the current node or not (which is determined by its index , also the level of this node in the graph)



回答2:

Actually I figured out a way to check for w to be in S or not in constant time. Simply have a boolean field inStack.

While pushing node w in stack, make w.inStack = true and while popping it, make it false.

But the second question still remains. Can we make the slight modification without disturbing the algorithm?



回答3:

  1. For the second question, 'Can we make the slight modification', my guess is you can. For the lowlink value is to indicate to a node, who has no more un-visited neighbors and which has dfs index = lowlink that it is the root of a component, and thus can pop the stack and print.

    Hence a component nodes, lowlink value can be set to any value greater than or equal to the dfs index of the components root.