Python Named Entity Recognition Error: IndexError:

2019-08-17 04:20发布

问题:

Hi i am new to python and tried to run script (https://github.com/detuvoldo/tagger), I replaced the 2 lines in utils.py because i am using Windows 10 and there were some path related issues.

models_path = u"\\\\?\\" + os.path.abspath(u".\\models")
eval_path = os.path.abspath(u".\\evaluation")

The error is

run train.py --train lstm/fold1/train --dev lstm/fold1/dev --test lstm/fold1/test
WARNING (theano.sandbox.cuda): The cuda backend is deprecated and will be removed in the next release (v0.10). Please switch to the gpuarray backend. You can get more information about how to switch at this URL:
https://github.com/Theano/Theano/wiki/Converting-to-the-new-gpu-back-end%28gpuarray%29

Using gpu device 0: GeForce GT 620M (CNMeM is enabled with initial size: 85.0% of memory, cuDNN not available)
Model location: \?\E:\New-Code\tagger-master\tagger-master\models\tag_scheme=iob,lower=False,zeros=False,char_dim=25,char_lstm_dim=25,char_bidirect=True,word_dim=100,word_lstm_dim=100,word_bidirect=True,pre_emb=,all_emb=False,cap_dim=0,crf=True,dropout=0.3,lr_method=sgd-lr_.005
Found 2573 unique words (48986 in total)
Found 64 unique characters
Found 27 unique named entity tags
858 / 289 / 286 sentences in train / dev / test.
Saving the mappings to disk...
Compiling...
Starting epoch 0...
50, cost average: 101.645935
100, cost average: 83.234520
150, cost average: 82.757523
200, cost average: 69.019493
250, cost average: 64.411346
300, cost average: 62.836563
350, cost average: 60.969635
400, cost average: 58.851826
450, cost average: 49.994457
ID NE Total O I-LOC B-CTT B-OBJ B-LOC B-ACR B-INT B-PRC I-FACE I-PRC I-ACR I-OBJ B-FNUM I-FNUM I-DDIR B-FACEI-BEDNUM I-CTT B-DDIR I-INTB-BEDNUMB-BATHNUMI-BATHNUM I-FPOS B-FPOS I-BDIR B-BDIR Percent
0 O 9314 9175 0 63 14 0 0 62 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 98.508
1 I-LOC 2604 2602 0 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.000
2 B-CTT 478 245 0 233 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 48.745
3 B-OBJ 464 282 0 0 177 0 0 5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 38.147
4 B-LOC 439 439 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.000
5 B-ACR 346 334 0 1 1 0 7 2 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2.023
6 B-INT 339 126 0 0 32 0 0 181 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 53.392
7 B-PRC 233 232 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.000
8 I-FACE 218 218 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.000
9 I-PRC 232 225 0 7 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.000
10 I-ACR 214 203 0 0 2 0 1 0 0 0 0 7 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 3.271
11 I-OBJ 201 198 0 0 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.000
12 B-FNUM 170 156 0 0 5 0 0 8 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.000
13 I-FNUM 166 157 0 0 8 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.000
14 I-DDIR 170 169 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.000
15 B-FACE 120 120 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.000
16I-BEDNUM 103 103 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.000
17 I-CTT 103 98 0 5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.000
18 B-DDIR 83 83 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.000
19 I-INT 57 56 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.000
20B-BEDNUM 57 57 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.000
21B-BATHNUM 44 44 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.000
22I-BATHNUM 45 44 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.000
23 I-FPOS 42 42 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.000
24 B-FPOS 37 36 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.000
25 I-BDIR 22 22 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.000
26 B-BDIR 6 6 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.000
9780/16307 (59.97424%)
Traceback (most recent call last):

File "E:\New-Code\tagger-master\tagger-master\train.py", line 221, in 
dev_data, id_to_tag, dico_tags, epoch)

File "utils.py", line 284, in evaluate
return float(eval_lines[1].strip().split()[-1])

IndexError: list index out of range

Can you please suggest something that can help me solve the error? I am stuck for the last 2 months. Thanks

回答1:

I assume that you are running the script from the E:\New-Code\tagger-master\tagger-master\ directory and "models" and "evaluation" are right inside it. In this case, this should specify a path correctly:

models_path = "models"
eval_path = "evaluation"
eval_temp = os.path.join(eval_path, "temp")
eval_script = os.path.join(eval_path, "conlleval")

If you see this error with this setting, the problem is with one of your "eval.*.scores" files, not path specification. I can't say for sure what is must contain, but at least provide its actual content.