I'm reading in a long list of words, and I made a node for every word in the list. Each node has an attribute 'word' for their position in the list.
I am trying to connect a node to the next node if the next node is the previous node, with an addition of just one letter
I also alphabetically ordered each word per character, so that CAT -> ACT
I want to draw an edge from each unique starting word, to all of the possible chains, so I can see all the possible chains in the list.
For example
A -> AN -> TAN -> RANT
However A --x-> T
This is my attempt
for i in range(0, G.number_of_nodes()-1):
if ( ( (len(G.node[i]['word'])+1) == len(G.node[i+1]['word']) ) and (G.node[i]['word'] in G.node[i+1]['word'])):
print G.node[i]['word'], G.node[i+1]['word']
Gave me this,
DGO DGOS DGOS DGOSS I IN ELLMS ELLMSS AEPRS AEPRSS INW DINW DINW DINWY
What the word list and the alphabetical list looks like
Why do I not see IN INW?
Also, AGNRT AGNRST should be on there but I don't understand why, along with a lot of other pairs
Where do you think I went wrong?
Looks like a formal languages problem. How do you handle looping nodes?
IN INW is in the list you gave.
AGNRT AGNRST are not in the list, because you started out with a single letter, that letter has to be in the next word for example I -> IN, but IN is not in AGNRT or AGNRST
The problem is that you are only comparing words that appear right next to each other in the list, i.e. words
i
andi+1
, e.g.I
andIN
appear next to each other, as doWIN
andWIND
, butIN
andWIND
are far apart. It seems you want to compare all possible words, which requires a more sophisticated algorithm. Here's an idea:{"ACT": ["CAT", "ACT", "TAC], ...}
. Acollections.defaultdict(list)
will be useful for this.list.sort(key=len)
assuming you have just a list of words.n-1
. Something likefor i in range(len(word)): process(word[:i] + word[i+1:])
. You may want to be careful about duplicates here.You seem to be comparing each node with just one other node, so
"IN" directly follows "I" in your wordlist, but "INW" is not directly after "IN"
You can use the 3rd party python library,
python-levenshtein
, to calculate the Levenshtein Distance which is the string edit distance. In your case, the only allowed 'edit' is the 'insertion' of the character on the next string/word on your list, so you will also need to verify that the length of the next word is 1 plus the previous word.Here is the sample code that would achieve our stuff:
You can install
python-levenshtein
by eitherapt-get
(systemwide) orpip
:sudo apt-get install python-levenshtein
or
sudo apt-get install python3-levenshtein
or
pip install python-levenshtein