how to represent gazetteers or dictionaries as fea

2019-05-03 17:07发布

how to use gazetteers or dictionaries as features in CRF++?

To elaborate: suppose I want to do NER on person names, and I am having a gazetteer (or dictionary) containing commonly seen person names, I want to use this gazetteer as an input to crf++, how can I do that?

I am using the conditional random field package crf++ to perform named entity recognition tasks. I know how to represent some commonly used features in crf++. For example, if we want to use Capitalization as a feature, we can add one separate column in the feature template of crf indicating if a word is capitalized or not.

1条回答
爷、活的狠高调
2楼-- · 2019-05-03 17:31

You could make a new feature that indicates if a token is in the dictionary/gazeteer. Just check for set membership and set the Gazeteer feature to 1 or 0.

查看更多
登录 后发表回答