My question is similar to this unanswered question: Using custom POS tags for NLTK chunking?, but the error I am getting is different. I am trying to parse a sentence to which I have added my own domain specific tags.
For example:
(u'greatest', 'P'), (u'internet', 'NN'), (u'ever', 'A'),
(u',', ','), (u'and', 'CC'), (u'its', 'PRP$'), (u'being', 'VBG'),
(u'slow', 'N'), (u'as', 'IN'), (u'hell', 'NN')`
where (u'slow', 'N')
is a custom tag 'N'
.
I am trying to parse this using the following:
grammar=r"""
Chunk:`{<A>?*<P>+}`
"""
parser=nltk.RegexpParser(grammar)
But I am getting the following error:
ValueError: Illegal chunk pattern: `{<A>?*<P>+}`
Does nltk.RegexpParser
process custom tags? Is there any other nltk or python based parser which can do that?
nltk.RegexpParser can process custom tags.
Here is how you can modify your code to work:
This is the result you would get for your test data:
I'm not familiar with NTLK, but in Python regular expressions
?*
is a syntax error. Perhaps you meant*?
which is a lazy quantifier.