How to parse custom tags using nltk.Regexp.parser(

2019-02-07 12:57发布

My question is similar to this unanswered question: Using custom POS tags for NLTK chunking?, but the error I am getting is different. I am trying to parse a sentence to which I have added my own domain specific tags.

For example:

(u'greatest', 'P'), (u'internet', 'NN'), (u'ever', 'A'), 
(u',', ','), (u'and', 'CC'), (u'its', 'PRP$'), (u'being', 'VBG'), 
(u'slow', 'N'), (u'as', 'IN'), (u'hell', 'NN')`

where (u'slow', 'N') is a custom tag 'N'.

I am trying to parse this using the following:

grammar=r"""
Chunk:`{<A>?*<P>+}`
"""
parser=nltk.RegexpParser(grammar)

But I am getting the following error:

ValueError: Illegal chunk pattern: `{<A>?*<P>+}`

Does nltk.RegexpParser process custom tags? Is there any other nltk or python based parser which can do that?

2条回答
\"骚年 ilove
2楼-- · 2019-02-07 13:04

nltk.RegexpParser can process custom tags.

Here is how you can modify your code to work:

# Import the RegexpParser
from nltk.chunk import RegexpParser

# Define your custom tagged data. 
tags = [(u'greatest', 'P'), (u'internet', 'NN'), (u'ever', 'A'), 
(u',', ','), (u'and', 'CC'), (u'its', 'PRP$'), (u'being', 'VBG'), 
(u'slow', 'N'), (u'as', 'IN'), (u'hell', 'NN')]

# Define your custom grammar (modified to be a valid regex).
grammar = """ CHUNK: {<A>*<P>+} """

# Create an instance of your custom parser.
custom_tag_parser = RegexpParser(grammar)

# Parse!
custom_tag_parser.parse(tags)

This is the result you would get for your test data:

Tree('S', [Tree('CHUNK', [(u'greatest', 'P')]), (u'internet', 'NN'), (u'ever', 'A'), (u',', ','), (u'and', 'CC'), (u'its', 'PRP$'), (u'being', 'VBG'), (u'slow', 'N'), (u'as', 'IN'), (u'hell', 'NN')])
查看更多
对你真心纯属浪费
3楼-- · 2019-02-07 13:12

I'm not familiar with NTLK, but in Python regular expressions ?* is a syntax error. Perhaps you meant *? which is a lazy quantifier.

查看更多
登录 后发表回答