I need to find all English words which can be formed from the letters in a string
sentence="Ziegler's Giant Bar"
I can make an array of letters by
sentence.split(//)
How can I make more than 4500 English words from the sentence in Ruby?
[edit]
It may be best to split the problem into parts:
- to make only an array of words with 10 letters or less
- the longer words can be looked up separately
[Assuming you can reuse the source letters within one word]: For each word in your dictionary list, construct two arrays of letters - one for the candidate word and one for the input string. Subtract the input array-of-letters from the word array-of-letters and if there weren't any letters left over, you've got a match. Code to do that looks like this:
You can call that function from the irb debugger like so:
...or here's a wrapper you could use to call the function interactively from a script:
When running this on a Mac, the output looks like this:
That is well over 4500 words, but that's because the Mac word dictionary is pretty large. If you want to reproduce Knuth's results exactly, download and unzip Knuth's dictionary from here: http://www.packetstormsecurity.org/Crackers/wordlists/dictionaries/knuth_words.gz and replace "/usr/share/dict/words" with the path to wherever you've unpacked the substitute directory. If you did it right you'll get 4514 words, ending in this collection:
I believe that answers the original question.
Alternatively, the questioner/reader might have wanted to list all the words one can construct from a string without reusing any of the input letters. My suggested code to accomplish that works as follows: Copy the candidate word, then for each letter in the input string, destructively remove the first instance of that letter from the copy (using "slice!"). If this process absorbs all the letters, accept that word.
You can get an array of letters like so:
If you want to find words whose letters and frequency thereof are restricted by the given phrase, you can construct a regex to do this for you:
Positive lookaheads let you make a regex that matches a position in the string where some specified pattern matches without consuming the part of the string that matches. We use them here to match the same string against multiple patterns in a single regex. The position only matches if all our patterns match.
If we allow infinite reuse of letters from the original phrase (like Knuth did according to glenra's comment), then it's even easier to construct a regex:
I don't think that Ruby has an English dictionary. But you could try to store all permutations of the original string in an array, and check those strings against Google? Say that a word is actually a word, if has more than 100.000 hits or something?