Making a meaningful sentence from a given set of w

2019-03-29 07:28发布

问题:

I am working on a program that needs to create a sentence that is grammatically correct from given set of words. Here I will be passing an input of a list of strings to the program and my output should be a meaningful sentence made with those words, and a few other words that are necessary. Eg.

Input: {'You' , 'House' , 'Beautiful'}
Output: 'Your house is beautiful' (or) 'you house is beautiful' 
Input: {'Father' , 'Love' , 'Child'}
Output: 'The father loves the child'

How do I implement this with NLTK and(or) Machine Learning?

Any suggestions as to how I should go about this? I'm ready to even the most wildest ideas. Thanks! :)

回答1:

In this case you can apply an n-gram model. The idea is that a sentence

I like NLP very much.

gets the following 3-grams:

<s> I like
I like NLP
like NLP very
NLP very much
very much </s>

Then you think of it as a probability model P(word3 | word1 word2).

So your work would be:

Get a lot of data of n words after each other (e.g. I think https://books.google.com/ngrams has a download option)
For a given set of words, find all n-grams which contain only those words
Find the most likely combination.

Please note:

n should be at least 3
the bigger n gets, the more likely it gets that you have to "back off" as you don't have data (but the n-gram might exist and make sense)
even n=5 is already VERY much data