python check string contains all characters

I'm reading in a long list of words, and I made a node for every word in the list. Each node has an attribute 'word' for their position in the list.

I am trying to connect a node to the next node if the next node is the previous node, with an addition of just one letter

I also alphabetically ordered each word per character, so that CAT -> ACT

I want to draw an edge from each unique starting word, to all of the possible chains, so I can see all the possible chains in the list.

For example

A -> AN -> TAN -> RANT

However A --x-> T

This is my attempt

for i in range(0, G.number_of_nodes()-1):

    if ( ( (len(G.node[i]['word'])+1)  == len(G.node[i+1]['word']) )      and (G.node[i]['word'] in G.node[i+1]['word'])):
        print G.node[i]['word'], G.node[i+1]['word']

Gave me this,

DGO DGOS
DGOS DGOSS
I IN
ELLMS ELLMSS
AEPRS AEPRSS
INW DINW
DINW DINWY

What the word list and the alphabetical list looks like

Why do I not see IN INW?

Also, AGNRT AGNRST should be on there but I don't understand why, along with a lot of other pairs

Where do you think I went wrong?

标签： python algorithm python-3.x nodes networkx

4条回答

戒情不戒烟

2楼-- · 2019-03-06 17:04

Looks like a formal languages problem. How do you handle looping nodes?

IN INW is in the list you gave.

AGNRT AGNRST are not in the list, because you started out with a single letter, that letter has to be in the next word for example I -> IN, but IN is not in AGNRT or AGNRST

0人赞添加讨论(0) 举报

男人必须洒脱

3楼-- · 2019-03-06 17:06

The problem is that you are only comparing words that appear right next to each other in the list, i.e. words i and i+1, e.g. I and IN appear next to each other, as do WIN and WIND, but IN and WIND are far apart. It seems you want to compare all possible words, which requires a more sophisticated algorithm. Here's an idea:

Make a dictionary where they keys are sorted words and the values are lists of actual words, e.g. {"ACT": ["CAT", "ACT", "TAC], ...}. A collections.defaultdict(list) will be useful for this.
Sort the full input list of words by length. You can use list.sort(key=len) assuming you have just a list of words.
Iterate through the list sorted by length. For each word, go through every subset of length n-1. Something like for i in range(len(word)): process(word[:i] + word[i+1:]). You may want to be careful about duplicates here.
For each subset, sort the subset and look it up in the dictionary. Make a link from every word in the dictionary's value (a list of actual words) to the bigger word.

0人赞添加讨论(0) 举报

爱情/是我丢掉的垃圾

4楼-- · 2019-03-06 17:21

You seem to be comparing each node with just one other node, so

"IN" directly follows "I" in your wordlist, but "INW" is not directly after "IN"

0人赞添加讨论(0) 举报

你好瞎i

5楼-- · 2019-03-06 17:22

You can use the 3rd party python library, python-levenshtein, to calculate the Levenshtein Distance which is the string edit distance. In your case, the only allowed 'edit' is the 'insertion' of the character on the next string/word on your list, so you will also need to verify that the length of the next word is 1 plus the previous word.

Here is the sample code that would achieve our stuff:

import Levenshtein as lvst

if len(word2) - len(word1) == 1 and lvst.distance(word1, word2) == 1:
    print(word1, word2)

You can install python-levenshtein by either apt-get (systemwide) or pip:

sudo apt-get install python-levenshtein

sudo apt-get install python3-levenshtein

pip install python-levenshtein

0人赞添加讨论(0) 举报

python check string contains all characters

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间