For a Data Structures project, I must find the shortest path between two words (like "cat"
and "dog"
), changing only one letter at a time. We are given a Scrabble word list to use in finding our path. For example:
cat -> bat -> bet -> bot -> bog -> dog
I've solved the problem using a breadth first search, but am seeking something better (I represented the dictionary with a trie).
Please give me some ideas for a more efficient method (in terms of speed and memory). Something ridiculous and/or challenging is preferred.
I asked one of my friends (he's a junior) and he said that there is no efficient solution to this problem. He said I would learn why when I took the algorithms course. Any comments on that?
We must move from word to word. We cannot go cat -> dat -> dag -> dog
. We also have to print out the traversal.
My gut feeling is that your friend is correct, in that there isn't a more efficient solution, but that is assumming you are reloading the dictionary every time. If you were to keep a running database of common transitions, then surely there would be a more efficient method for finding a solution, but you would need to generate the transitions beforehand, and discovering which transitions would be useful (since you can't generate them all!) is probably an art of its own.
NEW ANSWER
Given the recent update, you could try A* with the Hamming distance as a heuristic. It's an admissible heuristic since it's not going to overestimate the distance
OLD ANSWER
You can modify the dynamic-program used to compute the Levenshtein distance to obtain the sequence of operations.
EDIT: If there are a constant number of strings, the problem is solvable in polynomial time. Else, it's NP-hard (it's all there in wikipedia) .. assuming your friend is talking about the problem being NP-hard.
EDIT: If your strings are of equal length, you can use Hamming distance.
There are methods of varying efficiency for finding links - you can construct a complete graph for each word length, or you can construct a BK-Tree, for example, but your friend is right - BFS is the most efficient algorithm.
There is, however, a way to significantly improve your runtime: Instead of doing a single BFS from the source node, do two breadth first searches, starting at either end of the graph, and terminating when you find a common node in their frontier sets. The amount of work you have to do is roughly half what is required if you search from only one end.