I am trying to use NLTK for semantic parsing of spoken navigation commands such as "go to San Francisco", "give me directions to 123 Main Street", etc.
This could be done with a fairly simple CFG grammar such as
S -> COMMAND LOCATION
COMMAND -> "go to" | "give me directions to" | ...
LOCATION -> CITY | STREET | ...
The problem is that this involves non-atomic (more than one word-long) literals such as "go to", which NLTK doesn't seem to be set up for (correct me if I am wrong). The parsing task has tagging as a prerequisite, and all taggers seem to always tag individual words. So, my options seem to be:
a) Define a custom tagger that can assign non-syntactic tags to word sequences rather than individual words (e.g., "go to" : "COMMAND"). b) Use features to augment the grammar, e.g., something like:
COMMAND -> VB[sem='go'] P[sem='to'] | ...
c) Use a chunker to extract sub-structures like COMMAND, then apply a parser to the result. Does NLTK allow chunker->parser cascading?
Some of these options seem convoluted (hacks). Is there a good way?
It seems like you want to identify imperatives.
This answer has looked into that and contains a solution similar to your option (a), but a bit different since it lets the tagger do most of the work. (b) indeed seems a bit hacky... but you're creating a pretty custom application, so it could work! I would do (c) the other way around - parsing and then chunking based on the CFG in (a).
Overall, however, as the other answer explains, there doesn't seem to be a perfect way to do this just yet.
You might also want to look at pattern.en. Their