I am trying to use NLTK for semantic parsing of spoken navigation commands such as "go to San Francisco", "give me directions to 123 Main Street", etc.
This could be done with a fairly simple CFG grammar such as
S -> COMMAND LOCATION
COMMAND -> "go to" | "give me directions to" | ...
LOCATION -> CITY | STREET | ...
The problem is that this involves non-atomic (more than one word-long) literals such as "go to", which NLTK doesn't seem to be set up for (correct me if I am wrong). The parsing task has tagging as a prerequisite, and all taggers seem to always tag individual words. So, my options seem to be:
a) Define a custom tagger that can assign non-syntactic tags to word sequences rather than individual words (e.g., "go to" : "COMMAND"). b) Use features to augment the grammar, e.g., something like:
COMMAND -> VB[sem='go'] P[sem='to'] | ...
c) Use a chunker to extract sub-structures like COMMAND, then apply a parser to the result. Does NLTK allow chunker->parser cascading?
Some of these options seem convoluted (hacks). Is there a good way?