I have some degenerate tree
(it looks like as array or doubly linked list). For example, it is this tree:
Each edge has some weight. I want to find all equal paths, which starts in each vertex.
In other words, I want to get all tuples (v1, v, v2) where v1 and v2 are an arbitrary ancestor and descendant such that c(v1, v) = c(v, v2)
.
Let edges have the following weights (it is just example):
a-b = 3
b-c = 1
c-d = 1
d-e = 1
Then:
- The vertex
A
does not have any equal path (there is no vertex from left side). - The vertex
B
has one equal pair. The pathB-A
equals to the pathB-E
(3 == 3)
. - The vertex
C
has one equal pair. The pathB-C
equals to the pathC-D
(1 == 1)
. - The vertex
D
has one equal pair. The pathC-D
equals to the pathD-E
(1 == 1)
. - The vertex
E
does not have any equal path (there is no vertex from right side).
I implement simple algorithm, which works in O(n^2)
. But it is too slow for me.
You write, in comments, that your current approach is
There is a simpler and, I think, faster
O(n^2)
approach, based on the so called two pointers method.For each vertix
v
go at the same time into two possible directions. Have one "pointer" to a vertex (vl
) moving in one direction and another (vr
) into another direction, and try to keep the distance fromv
tovl
as close to the distance fromv
tovr
as possible. Each time these distances become equal, you have equal paths.(By precalculating the prefix sums, you can find
dist
in O(1).)It's easy to see that no equal pair will be missed provided that you do not have zero-length edges.
Regarding a faster solution, if you want to list all pairs, then you can't do it faster, because the number of pairs will be O(n^2) in the worst case. But if you need only the amount of these pairs, there might exist faster algorithms.
UPD: I came up with another algorithm for calculating the amount, which might be faster in case your edges are rather short. If you denote the total length of your chain (sum of all edges weight) as
L
, then the algorithm runs inO(L log L)
. However, it is much more advanced conceptually and more advanced in coding too.Firstly some theoretical reasoning. Consider some vertex
v
. Let us have two arrays,a
andb
, not the C-style zero-indexed arrays, but arrays with indexation from-L
toL
.Let us define
i>0
,a[i]=1
iff to the right ofv
on the distance exactlyi
there is a vertex, otherwisea[i]=0
i=0
,a[i]≡a[0]=1
i<0
,a[i]=1
iff to the left ofv
on the distance exactly-i
there is a vertex, otherwisea[i]=0
A simple understanding of this array is as follows. Stretch your graph and lay it along the coordinate axis so that each edge has the length equal to its weight, and that vertex
v
lies in the origin. Thena[i]=1
iff there is a vertex at coordinatei
.For your example and for vertex "b" chosen as
v
:For another array, array
b
, we define the values in a symmetrical way with respect to origin, as if we have inverted the direction of the axis:i>0
,b[i]=1
iff to the left ofv
on the distance exactlyi
there is a vertex, otherwiseb[i]=0
i=0
,b[i]≡b[0]=1
i<0
,b[i]=1
iff to the right ofv
on the distance exactly-i
there is a vertex, otherwiseb[i]=0
Now consider a third array
c
such thatc[i]=a[i]*b[i]
, asterisk here stays for ordinary multiplication. Obviouslyc[i]=1
iff the path of lengthabs(i)
to the left ends in a vertex, and the path of lengthabs(i)
to the right ends in a vertex. So fori>0
each position inc
that hasc[i]=1
corresponds to the path you need. There are also negative positions (c[i]=1
withi<0
), which just reflect the positive positions, and one more position wherec[i]=1
, namely positioni=0
.Calculate the sum of all elements in
c
. This sum will besum(c)=2P+1
, whereP
is the total number of paths which you need withv
being its center. So if you knowsum(c)
, you can easily determineP
.Let us now consider more closely arrays
a
andb
and how do they change when we change the vertexv
. Let us denotev0
the leftmost vertex (the root of your tree) anda0
andb0
the correspondinga
andb
arrays for that vertex.For arbitrary vertex
v
denoted=dist(v0,v)
. Then it is easy to see that for vertexv
the arraysa
andb
are just arraysa0
andb0
shifted byd
:It is obvious if you remember the picture with the tree stretched along a coordinate axis.
Now let us consider one more array,
S
(one array for all vertices), and for each vertexv
let us put the value ofsum(c)
into theS[d]
element (d
andc
depend onv
).More precisely, let us define array
S
so that for eachd
Once we know the
S
array, we can iterate over vertices and for each vertexv
obtain itssum(c)
simply asS[d]
withd=dist(v,v0)
, because for each vertexv
we havesum(c)=sum(a0[i+d]*b0[i-d])
.But the formula for
S
is very simple:S
is just the convolution of thea0
andb0
sequences. (The formula does not exactly follow the definition, but is easy to modify to the exact definition form.)So what we now need is given
a0
andb0
(which we can calculate inO(L)
time and space), calculate theS
array. After this, we can iterate overS
array and simply extract the numbers of paths fromS[d]=2P+1
.Direct application of the formula above is
O(L^2)
. However, the convolution of two sequences can be calculated inO(L log L)
by applying the Fast Fourier transform algorithm. Moreover, you can apply a similar Number theoretic transform (don't know whether there is a better link) to work with integers only and avoid precision problems.So the general outline of the algorithm becomes
(I call it
corrected_convolution
becauseS
is not exactly a convolution, but a very similar object for which a similar algorithm can be applied. Moreover, you can even defineS'[2*d]=S[d]=sum(a0[i+d]*b0[i-d])=sum(a0[i]*b0[i-2*d])
, and thenS'
is the convolution proper.)