-->

Algorithms for Natural Language Understanding

2019-05-23 10:55发布

问题:

I wanted to know what algorithms I could use for NLU?

For example, let's say I want to start a program, and I have these sentences

"Let us start"

"Let him start"

Obviously, the first sentence should start the program, but not the second one (since it doesn't make sense).

Right now, I have am using Stanford's NLP API and have implemented the TokenRegexAnnotator class:

CoreMapExpressionExtractor<MatchedExpression> extractor = CoreMapExpressionExtractor.createExtractorFromFile(env, "tr.txt");

So my code "knows" what "Start" should do, that is, "Start" should trigger/start the program. But "Start" could be used with anything, like "Start the car." In this case, I wouldn't want to "Start" the program because the sentence is about starting a car, not the program. To solve this, I used Stanford's CollapsedDependenciesAnnotation class:

SemanticGraph dependencies = s.get(CollapsedDependenciesAnnotation.class);
Iterable<SemanticGraphEdge> edge_set = dependencies.edgeIterable();

I used the nsubj dependency to see if the subject was a PRP (pronoun) since I want the program to start only when the subject is a PRP. So when I inputed the sentence "let us start" in my program, the program started. However, when I inputed the sentence "Start the car," the program didn't start. All is working well...

BUT the program will also start when I input the sentence "Let him start" (as mentioned above). (It starts because "him" is also a pronoun). I do not want the program to start when I input this sentence (because "Let him start" has nothing to do with the starting the program). So how will the program know this? What can I do to solve this problem? Are there algorithms that will let the computer differentiate between "let us start" and "let him start"?

Any ideas on how to solve this problem?

Thank you!

(I hope I am being clear)

回答1:

One way Stanford CoreNLP could help you is its TokensRegex functionality. With this tool you can write explicit patterns and then tag them in your input text. Then your code can react based on the presence of certain patterns.

Here are some links with more info:

http://nlp.stanford.edu/software/tokensregex.shtml

http://nlp.stanford.edu/software/regexner/

I would recommend identifying common expressions that you want to handle that deserve a clear response, and build up so you get decent coverage of what users input.

For instance:

Let us (start|begin).
(Start|begin) the (program|software)
I'm ready to (start|begin)
etc...

Obviously you could combine these rules and make them increasingly complicated. But I think a straight forward approach would be to think of the various ways one might express they want to begin and then capture that with rules.



回答2:

I have a quick solution for you if you are okay with using an online API, you can easily achieve this with Wit AI's cloud API: http://wit.ai/. All you do is just create intents for your commands and specify the data that you want to extract and you're good to go. Otherwise if you aren't then you'll have to write the algorithms yourself to do what http://wit.ai/ does, which is what I ended up doing for my personal project because I wanted a self-contained system i.e. without using cloud APIs. As a heads up, the algorithm uses TokensRegex to find TokenSequencePatterns.