Shortest path to transform one word into another

For a Data Structures project, I must find the shortest path between two words (like "cat" and "dog"), changing only one letter at a time. We are given a Scrabble word list to use in finding our path. For example:

cat -> bat -> bet -> bot -> bog -> dog

I've solved the problem using a breadth first search, but am seeking something better (I represented the dictionary with a trie).

Please give me some ideas for a more efficient method (in terms of speed and memory). Something ridiculous and/or challenging is preferred.

I asked one of my friends (he's a junior) and he said that there is no efficient solution to this problem. He said I would learn why when I took the algorithms course. Any comments on that?

We must move from word to word. We cannot go cat -> dat -> dag -> dog. We also have to print out the traversal.

标签： algorithm shortest-path edit-distance hamming-distance

9条回答

叛逆

2楼-- · 2020-01-27 12:29

With a dictionary, BFS is optimal, but the running time needed is proportional to its size (V+E). With n letters, the dictionary might have ~a^n entires, where a is alphabet size. If the dictionary contains all words but the one that should be on the end of chain, then you'll traverse all possible words but won't find anything. This is graph traversal, but the size might be exponentially large.

You may wonder if it is possible to do it faster - to browse the structure "intelligently" and do it in polynomial time. The answer is, I think, no.

The problem:

You're given a fast (linear) way to check if a word is in dictionary, two words u, v and are to check if there's a sequence u -> a₁ -> a₂ -> ... -> a_n -> v.

is NP-hard.

Proof: Take some 3SAT instance, like

(p or q or not r) and (p or not q or r)

You'll start with 0 000 00 and are to check if it is possible to go to 2 222 22.

The first character will be "are we finished", three next bits will control p,q,r and two next will control clauses.

Allowed words are:

Anything that starts with 0 and contains only 0's and 1's
Anything that starts with 2 and is legal. This means that it consists of 0's and 1's (except that the first character is 2, all clauses bits are rightfully set according to variables bits, and they're set to 1 (so this shows that the formula is satisfable).
Anything that starts with at least two 2's and then is composed of 0's and 1's (regular expression: 222* (0+1)*, like 22221101 but not 2212001

To produce 2 222 22 from 0 000 00, you have to do it in this way:

(1) Flip appropriate bits - e.g. 0 100 111 in four steps. This requires finding a 3SAT solution.

(2) Change the first bit to 2: 2 100 111. Here you'll be verified this is indeed a 3SAT solution.

(3) Change 2 100 111 -> 2 200 111 -> 2 220 111 -> 2 222 111 -> 2 222 211 -> 2 222 221 -> 2 222 222.

These rules enforce that you can't cheat (check). Going to 2 222 22 is possible only if the formula is satisfable, and checking that is NP-hard. I feel it might be even harder (#P or FNP probably) but NP-hardness is enough for that purpose I think.

Edit: You might be interested in disjoint set data structure. This will take your dictionary and group words that can be reached from each other. You can also store a path from every vertex to root or some other vertex. This will give you a path, not neccessarily the shortest one.

0人赞添加讨论(0) 举报

Fickle 薄情

3楼-- · 2020-01-27 12:33

This is a typical dynamic programming problem. Check for the Edit Distance problem.

0人赞添加讨论(0) 举报

傲

4楼-- · 2020-01-27 12:39

bool isadjacent(string& a, string& b)
{
  int count = 0;  // to store count of differences
  int n = a.length();

  // Iterate through all characters and return false
  // if there are more than one mismatching characters
  for (int i = 0; i < n; i++)
  {
    if (a[i] != b[i]) count++;
    if (count > 1) return false;
  }
  return count == 1 ? true : false;
}

// A queue item to store word and minimum chain length // to reach the word.

struct QItem
{
  string word;
  int len;
};

// Returns length of shortest chain to reach 'target' from 'start' // using minimum number of adjacent moves. D is dictionary

int shortestChainLen(string& start, string& target, set<string> &D)
{
  // Create a queue for BFS and insert 'start' as source vertex
  queue<QItem> Q;
  QItem item = {start, 1};  // Chain length for start word is 1
  Q.push(item);

  // While queue is not empty
  while (!Q.empty())
  {
    // Take the front word
    QItem curr = Q.front();
    Q.pop();

    // Go through all words of dictionary
    for (set<string>::iterator it = D.begin(); it != D.end(); it++)
    {
        // Process a dictionary word if it is adjacent to current
        // word (or vertex) of BFS
        string temp = *it;
        if (isadjacent(curr.word, temp))
        {
            // Add the dictionary word to Q
            item.word = temp;
            item.len = curr.len + 1;
            Q.push(item);

            // Remove from dictionary so that this word is not
            // processed again.  This is like marking visited
            D.erase(temp);

            // If we reached target
            if (temp == target)
              return item.len;
        }
    }
  }
  return 0;
}

// Driver program
int main()
{
  // make dictionary
  set<string> D;
  D.insert("poon");
  D.insert("plee");
  D.insert("same");
  D.insert("poie");
  D.insert("plie");
  D.insert("poin");
  D.insert("plea");
  string start = "toon";
  string target = "plea";
  cout << "Length of shortest chain is: "
       << shortestChainLen(start, target, D); 
  return 0; 
}

Copied from: https://www.geeksforgeeks.org/word-ladder-length-of-shortest-chain-to-reach-a-target-word/

0人赞添加讨论(0) 举报

Deceive 欺骗

5楼-- · 2020-01-27 12:41

What you are looking for is called the Edit Distance. There are many different types.

From (http://en.wikipedia.org/wiki/Edit_distance): "In information theory and computer science, the edit distance between two strings of characters is the number of operations required to transform one of them into the other."

This article about Jazzy (the java spell check API) has a nice overview of these sorts of comparisons (it's a similar problem - providing suggested corrections) http://www.ibm.com/developerworks/java/library/j-jazzy/

0人赞添加讨论(0) 举报

beautiful°

6楼-- · 2020-01-27 12:43

You can make it a little quicker by removing the words that are not the right length, first. More of the limited dictionary will fit into the CPU's cache. Probably all of it.

Also, all of the strncmp comparisons (assuming you made everything lowercase) can be memcmp comparisons, or even unrolled comparisons, which can be a speedup.

You could use some preprocessor magic and hard-compile the task for that word-length, or roll a few optimized variations of the task for common word lengths. All of those extra comparisons can 'go away' for pure unrolled fun.

0人赞添加讨论(0) 举报

Explosion°爆炸

7楼-- · 2020-01-27 12:46

You could find the longest common subsequence, and therefore finding the letters that must be changed.

0人赞添加讨论(0) 举报

1 2 下一页

Shortest path to transform one word into another

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间