python string replacement, all possible combinatio

2019-08-30 01:36发布

问题:

I have sentences like the following:

((wouldyou)) give me something ((please))

and a bunch of keywords, stored in arrays / lists:

keywords["wouldyou"] = ["can you", "would you", "please"]
keywords["please"] = ["please", "ASAP"]

I want to replace every occurrence of variables in parentheses with a suitable set of strings stored in an array and get every possible combination back. The amount of variables and keywords is undefined.

James helped me with the following code:

def filler(word, from_char, to_char):    
    options = [(c,) if c != from_char else (from_char, to_char) for c in word.split(" ")] 
    return (' '.join(o) for o in product(*options)) 
    list(filler('((?please)) tell me something ((?please))', '((?please))', ''))

It works great but only replaces one specific variable with empty strings. Now I want to go through various variables with different set of keywords. The desired result should look something like this:

can you give me something please
would you give me something please
please give me something please
can you give me something ASAP
would you give me something ASAP
please give me something ASAP

I guess it has something to do with to_ch, but I have no idea how to compare through list items at this place.

回答1:

This is a job for Captain Regex!

Partial, pseudo-codey, solution...

One direct, albeit inefficient (like O(n*m) where n is number of words to replace and m is average number of replacements per word), way to do this would be to use the regex functionality in the re module to match the words, then use the re.sub() method to swap them out. Then you could just embed that in nested loops. So (assuming you get your replacements into a dict or something first), it would look something like this:

for key in repldict:
  regexpattern = # construct a pattern on the fly for key
  for item in repldict[key]:
    newstring = re.sub(regexpattern, item)

And so forth. Only, you know, like with correct syntax and stuff. And then just append the newstring to a list, or print it, or whatever.

For creating the regexpatterns on the fly, string concatenation just should do it. Like a regex to match left parens, plus the string to match, plus a regex to match right parens.

If you do it that way, then you can handle the optional features just by looping over a second version of the regex pattern which appends a question mark to the end of the left parens, then does whatever you want to do with that.



回答2:

The following would work. It uses itertools.product to construct all of the possible pairings (or more) of your keywords.

import re, itertools

text = "((wouldyou)) give me something ((please))"

keywords = {}
keywords["wouldyou"] = ["can you", "would you", "please"]
keywords["please"] = ["please", "ASAP"]

# Get a list of bracketed terms
lsources = re.findall("\(\((.*?)\)\)", text)

# Build a list of the possible substitutions 
ldests = []
for source in lsources:
    ldests.append(keywords[source])

# Generate the various pairings
for lproduct in itertools.product(*ldests):
    output = text
    for src, dest in itertools.izip(lsources, lproduct):
        # Replace each term (you could optimise this using a single re.sub)
        output = output.replace("((%s))" % src, dest)

    print output

You could further improve it by avoiding the need to do multiple replace() and assignment calls with one re.sub() call.

This scripts gives the following output:

can you give me something please
can you give me something ASAP
would you give me something please
would you give me something ASAP
please give me something please
please give me something ASAP

It was tested using Python 2.7. You will need to think how to solve it if multiple identical keywords were used. Hopefully you find this useful.