Computer AI algorithm to write sentences?

2019-03-08 06:40发布

I am searching for information on algorithms to process text sentences or to follow a structure when creating sentences that are valid in a normal human language such as English. I would like to know if there are projects working in this field that I can go learn from or start using.

For example, if I gave a program a noun, provided it with a thesaurus (for related words) and part-of-speech (so it understood where each word belonged in a sentence) - could it create a random, valid sentence?

I'm sure there are many sub-sections of this kind of research so any leads into this would be great.

4条回答
▲ chillily
2楼-- · 2019-03-08 06:53

Writing random sentences is not that hard. Any parser textbook's simple-english-grammar example can be run in reverse to generate grammatically correct nonsense sentences.

Another way is the word-tuple-random-walk, made popular by the old BYTE magazine TRAVESTY, or stuff like http://www.perlmonks.org/index.pl?node_id=94856

查看更多
3楼-- · 2019-03-08 06:56

Yes. There is some work dealing with solving problems in NLG with AI techniques. As far as I know, currently, there is no method that you can use for any practical use.

If you have the background, I suggest getting familiar with some work by Alexander Koller from Saarland University. He describes how to code NLG to PDDL. The main article you'll want to read is "Sentence generating as a planning problem".

If you do not have any background in NLP, just search for the online courses or course materials by Michael Collings or Dan Jurafsky.

查看更多
再贱就再见
4楼-- · 2019-03-08 06:58

This is called NLG (Natural Language Generation), although that is mainly the task of generating text that describes a set of data. There is also a lot of research on completely random sentence generation as well.

One starting point is to use Markov chains to generate sentences. How this is done is that you have a transition matrix that says how likely it is to transition between every every part-of-speech. You also have the most likely starting and ending part-of-speech of a sentence. Put this all together and you can generate likely sequences of parts-of-speech.

Now, you are far from done, this will first of all not offer a very good result as you are only considering the probability between adjacent words (also called bi-grams), so what you want to do is to extend this to look for instance at the transition matrix between three parts-of-speech (this makes a 3D matrix and gives you trigrams). You can extend it to 4-grams, 5-grams, etc. depending on the processing power and if your corpus can fill such matrix.

Lastly, you need to patch up things such as object agreement (subject-verb-agreement, adjective-verb-agreement (not in English though), etc.) and tense, so that everything is congruent.

查看更多
Evening l夕情丶
5楼-- · 2019-03-08 07:03

The field you're looking for is called natural language generation, a subfield of natural language processing http://en.wikipedia.org/wiki/Natural_language_processing

Sentence generation is either really easy or really hard depending on how good you want the sentences to be. Currently, there aren't programs that will be able to generate 100% sensible sentences about given nouns (even with a thesaurus) -- if that is what you mean.

If, on the other hand, you would be satisfied with nonsense that was sometimes ungrammatical, then you could try an n-gram based sentence generator. These just chain together of words that tend to appear in sequence, and 3-4-gram generators look quite okay sometimes (although you'll recognize them as what generates a lot of spam email).

Here's an intro to the basics of n-gram based generation, using NLTK: http://www.nltk.org/book/ch02.html#generating-random-text-with-bigrams

查看更多
登录 后发表回答