The longest common substring problem according to wiki can be solved using a suffix tree.
From wiki:
The longest common substrings of a set of strings can be found by
building a generalised suffix tree for the strings, and then finding
the deepest internal nodes which have leaf nodes from all the strings
in the subtree below it
I don't get this.
Example: if I have:
ABCDE
and XABCZ
then the suffix tree is (some branches from XABCZ
omitted due to space):
The longest common substring is ABC
but it is not I can not see how the description of wiki helps here.
ABC
is not the deepest internal nodes with leaf nodes.
Any help to understand how this works?
Like what's been said before, your tree is incorrect.
This is what I get when running "ABCDE$XABCZ" through my code.
Suffix Tree code:
String = ABCDE$XABCZ$
End of word character 1 = $
└── (0)
├── (20) $
├── (22) ABC
│ ├── (15) DE$
│ └── (23) Z$
├── (24) BC
│ ├── (16) DE$
│ └── (25) Z$
├── (26) C
│ ├── (17) DE$
│ └── (27) Z$
├── (18) DE$
├── (19) E$
├── (21) XABCZ$
└── (28) Z$
In a (compact) suffix tree, you need to find the deepest internal node(s) which have leaf nodes from all the strings. If you have multiple nodes at the same depth, you have to compare the length of the string represented by that node. i.e. ABC, BC, and C all have the same depth, so you have to compare the length of ABC, BC, and C strings to see which is longer; which is ABC obviously.
Suffix Trie code:
└── null
├── A
│ └── B
│ └── C
│ ├── D
│ │ └── (E) ABCDE
│ └── (Z) ABCZ
├── B
│ └── C
│ ├── D
│ │ └── (E) BCDE
│ └── (Z) BCZ
├── C
│ ├── D
│ │ └── (E) CDE
│ └── (Z) CZ
├── D
│ └── (E) DE
├── (E) E
├── X
│ └── A
│ └── B
│ └── C
│ └── (Z) XABCZ
└── (Z) Z
In a (not compact) suffix trie, find the deepest internal node(s) which have leaf nodes from all the strings.
Hope it helps.
You have not actually drawn the suffix tree. Had you done it properly, at the root you would only have every possible character once. The tree only splits when a single letter can have multiple following suffixes. That forces common substrings together in the tree, which makes them findable.