I am working on a program that needs to create a sentence that is grammatically correct from given set of words. Here I will be passing an input of a list of strings to the program and my output should be a meaningful sentence made with those words, and a few other words that are necessary. Eg.
Input: {'You' , 'House' , 'Beautiful'}
Output: 'Your house is beautiful' (or) 'you house is beautiful'
Input: {'Father' , 'Love' , 'Child'}
Output: 'The father loves the child'
How do I implement this with NLTK and(or) Machine Learning?
Any suggestions as to how I should go about this? I'm ready to even the most wildest ideas. Thanks! :)
In this case you can apply an n-gram model. The idea is that a sentence
I like NLP very much.
gets the following 3-grams:
<s> I like
I like NLP
like NLP very
NLP very much
very much </s>
Then you think of it as a probability model P(word3 | word1 word2)
.
So your work would be:
- Get a lot of data of n words after each other (e.g. I think https://books.google.com/ngrams has a download option)
- For a given set of words, find all n-grams which contain only those words
- Find the most likely combination.
Please note:
- n should be at least 3
- the bigger n gets, the more likely it gets that you have to "back off" as you don't have data (but the n-gram might exist and make sense)
- even n=5 is already VERY much data