when i use CRF++0.58 to model a NE and progarm have a problem:
"reading training data:tagger.cpp(399) [feature_index_->buildFeatures(this)] 0.00s"
- the develop environment:
- red hat linux 6.5,gcc 5.0,CRF++0.58
- written feature template:
- template
- dataset:
- Boson_train.txt
- Boson_test.txt
- the first column is words ,the second column is pos,the third column is NER tagger
- the problem:
- when i want to train the NER model, i type this sentences "crf_learn -f 3 -c 4.0 template Boson_train crf_model", and i got this notification, "reading training data:tagger.cpp(399) [feature_index_->buildFeatures(this)] 0.00s". I can't understand the C++ language, so i can't fix the problem.
- the method i tryed:
- 1.change the encode type of dataset. I use notepad++ to change "utf-8 with no BOM" to "utf-8". It didn't work.
- 2.change the delimiter from '\t' to ' '(space). It didn't work.
- 3.And i think maybe the template was wrong.So i use the crf++0.58/example/seg/template for test. It worked. But this template is simple, so I use /example/JapaneseNE/template which is more similar with my feature template. It didn't work. Then, i check the JapaneseNE example It works well. So i got confused. Is there someone can help me.
template
- U00:%x[-2,0]
- U01:%x[-1,0]
- U02:%x[0,0]
- U03:%x[1,0]
- U04:%x[2,0]
- U05:%x[-2,0]/%x[-1,0]/%x[0,0]
- U06:%x[-1,0]/%x[0,0]/%x[1,0]
- U07:%x[0,0]/%x[1,0]/%x[2,0]
- U08:%x[-1,0]/%x[0,0]
U09:%x[0,0]/%x[1,0]
U10:%x[-2,1]/%x[0,1]
- U11:%x[-2,1]/%x[1,1]
- U11:%x[-1,1]/%x[0,1]
- U12:%x[0,0]/%x[0,1]
- U13:%x[0,1]/%x[1,1]
- U14:%x[0,1]/%x[2,1]
- U15:%x[-1,0]/%x[0,1]
- U16:%x[-1,0]/%x[-1,1]
- U17:%x[1,0]/%x[1,1]
- U18:%x[1,0]/%x[1,1]
U19:%x[2,0]/%x[2,1]
U20:%x[-1,2]
- U21:%x[-2,2]
- U22:%x[0,1]/%x[-1,2]
- U23:%x[0,1]/%x[-2,2]
- U24:%x[0,0]/%x[-1,2]
- U25:%x[0,0]/%x[-2,2]
- U26:%x[-1,2]/%x[-2,2]/%x[0,1]
- U27:%x[-2,2]/%x[0,1]/%x[1,1]
- U28:%x[-1,1]/%x[-1,2]/%x[0,1]
- U29:%x[-1,2]/%x[0,0]/%x[0,1]
- Boson_train
- 浙江 ns B_product_name
- 在线 b I_product_name
- 杭州 ns I_product_name
- 4 m B_time
- 月 m I_time
- 25 m I_time
- 日 m I_time
- 讯 ng Out
- ( x Out
- 记者 n Out
- x Out
- x B_person_name
- 施宇翔 nr I_person_name
- x Out
- 通讯员 n B_person_name
- x Out
- 方英 nr B_person_name
- ) x Out
- 毒贩 n Out
- 很 zg Out
- “ x Out
- 时髦 nr Out
- ” x Out
- , x Out
- 用 p Out
- 微信 vn B_product_name
- 交易 n Out
- 毒品 n Out
- 。 x Out
- 没 v Out
- 料想 v Out
- 警方 n B_person_name
- 也 d Out