Making a meaningful sentence from a given set of w

2019-03-29 07:28发布

问题:

I am working on a program that needs to create a sentence that is grammatically correct from given set of words. Here I will be passing an input of a list of strings to the program and my output should be a meaningful sentence made with those words, and a few other words that are necessary. Eg.

Input: {'You' , 'House' , 'Beautiful'}
Output: 'Your house is beautiful' (or) 'you house is beautiful' 
Input: {'Father' , 'Love' , 'Child'}
Output: 'The father loves the child'

How do I implement this with NLTK and(or) Machine Learning?

Any suggestions as to how I should go about this? I'm ready to even the most wildest ideas. Thanks! :)

回答1:

In this case you can apply an n-gram model. The idea is that a sentence

I like NLP very much.

gets the following 3-grams:

  1. <s> I like
  2. I like NLP
  3. like NLP very
  4. NLP very much
  5. very much </s>

Then you think of it as a probability model P(word3 | word1 word2).

So your work would be:

  1. Get a lot of data of n words after each other (e.g. I think https://books.google.com/ngrams has a download option)
  2. For a given set of words, find all n-grams which contain only those words
  3. Find the most likely combination.

Please note:

  • n should be at least 3
  • the bigger n gets, the more likely it gets that you have to "back off" as you don't have data (but the n-gram might exist and make sense)
  • even n=5 is already VERY much data