Caffe's test accuracy during validation phase

2019-06-05 22:56发布

问题:

I wonder why my test accuracy keeps on getting a constant value of 0.5. I use CaffeNet network with only change in the fully connected layer's parameter where I configured num_output: 2.

My training set contains 1000 positive and 1000 negative examples whereas my validation set has 1000 positive and 1000 negative examples as well. The dataset contains images of person (whole body RGB colored). I've defined a mean file and scale value in the data layer. My network is trained to learn a person or not (binary classifier).

A snippet of my solver information looks like below:

test_iter: 80
test_interval: 10
base_lr: 0.01
lr_policy: "step"
gamma: 0.1
stepsize: 20
display: 10
max_iter: 80
momentum: 0.9
weight_decay: 0.0005
snapshot: 10000

The training output is as follows:

I0228 11:49:27.411556  3422 solver.cpp:274] Learning Rate Policy: step
I0228 11:49:27.590368  3422 solver.cpp:331] Iteration 0, Testing net (#0)
I0228 11:53:29.203058  3429 data_layer.cpp:73] Restarting data prefetching from start.
I0228 11:57:59.969632  3429 data_layer.cpp:73] Restarting data prefetching from start.
I0228 11:58:26.602972  3422 solver.cpp:398]     Test net output #0: accuracy = 0.5
I0228 11:58:26.602999  3422 solver.cpp:398]     Test net output #1: loss = 0.726503 (* 1 = 0.726503 loss)
I0228 12:00:03.892771  3422 solver.cpp:219] Iteration 0 (-6.49109e-41 iter/s, 636.481s/10 iters), loss = 0.961699
I0228 12:00:03.892915  3422 solver.cpp:238]     Train net output #0: loss = 0.961699 (* 1 = 0.961699 loss)
I0228 12:00:03.892925  3422 sgd_solver.cpp:105] Iteration 0, lr = 0.01
I0228 12:04:28.831887  3426 data_layer.cpp:73] Restarting data prefetching from start.
I0228 12:13:36.909935  3422 solver.cpp:331] Iteration 10, Testing net (#0)
I0228 12:17:36.894516  3429 data_layer.cpp:73] Restarting data prefetching from start.
I0228 12:22:00.724030  3429 data_layer.cpp:73] Restarting data prefetching from start.
I0228 12:22:27.375306  3422 solver.cpp:398]     Test net output #0: accuracy = 0.5
I0228 12:22:27.375334  3422 solver.cpp:398]     Test net output #1: loss = 0.698973 (* 1 = 0.698973 loss)
I0228 12:23:56.072116  3422 solver.cpp:219] Iteration 10 (0.00698237 iter/s, 1432.18s/10 iters), loss = 0.696559
I0228 12:23:56.072247  3422 solver.cpp:238]     Train net output #0: loss = 0.696558 (* 1 = 0.696558 loss)
I0228 12:23:56.072252  3422 sgd_solver.cpp:105] Iteration 10, lr = 0.01
I0228 12:25:23.664594  3426 data_layer.cpp:73] Restarting data prefetching from start.
I0228 12:37:08.202978  3422 solver.cpp:331] Iteration 20, Testing net (#0)
I0228 12:41:05.859966  3429 data_layer.cpp:73] Restarting data prefetching from start.
I0228 12:45:28.599306  3429 data_layer.cpp:73] Restarting data prefetching from start.
I0228 12:45:55.524168  3422 solver.cpp:398]     Test net output #0: accuracy = 0.5
I0228 12:45:55.524190  3422 solver.cpp:398]     Test net output #1: loss = 0.693187 (* 1 = 0.693187 loss)
I0228 12:45:55.553427  3426 data_layer.cpp:73] Restarting data prefetching from start.
I0228 12:47:24.159780  3422 solver.cpp:219] Iteration 20 (0.00710183 iter/s, 1408.09s/10 iters), loss = 0.690313
I0228 12:47:24.159914  3422 solver.cpp:238]     Train net output #0: loss = 0.690313 (* 1 = 0.690313 loss)
I0228 12:47:24.159920  3422 sgd_solver.cpp:105] Iteration 20, lr = 0.001
I0228 12:57:31.167225  3426 data_layer.cpp:73] Restarting data prefetching from start.
I0228 13:00:23.671567  3422 solver.cpp:331] Iteration 30, Testing net (#0)
I0228 13:04:14.114737  3429 data_layer.cpp:73] Restarting data prefetching from start.
I0228 13:08:30.406244  3429 data_layer.cpp:73] Restarting data prefetching from start.
I0228 13:08:56.273648  3422 solver.cpp:398]     Test net output #0: accuracy = 0.5
I0228 13:08:56.273674  3422 solver.cpp:398]     Test net output #1: loss = 0.696971 (* 1 = 0.696971 loss)
I0228 13:10:28.487870  3422 solver.cpp:219] Iteration 30 (0.00722373 iter/s, 1384.33s/10 iters), loss = 0.700565
I0228 13:10:28.488041  3422 solver.cpp:238]     Train net output #0: loss = 0.700565 (* 1 = 0.700565 loss)
I0228 13:10:28.488049  3422 sgd_solver.cpp:105] Iteration 30, lr = 0.001
I0228 13:17:38.463490  3426 data_layer.cpp:73] Restarting data prefetching from start.
I0228 13:23:29.700287  3422 solver.cpp:331] Iteration 40, Testing net (#0)
I0228 13:27:27.217670  3429 data_layer.cpp:73] Restarting data prefetching from start.
I0228 13:31:48.651156  3429 data_layer.cpp:73] Restarting data prefetching from start.
I0228 13:32:15.021637  3422 solver.cpp:398]     Test net output #0: accuracy = 0.5
I0228 13:32:15.021661  3422 solver.cpp:398]     Test net output #1: loss = 0.694784 (* 1 = 0.694784 loss)
I0228 13:33:43.542735  3422 solver.cpp:219] Iteration 40 (0.00716818 iter/s, 1395.05s/10 iters), loss = 0.700307
I0228 13:33:43.542875  3422 solver.cpp:238]     Train net output #0: loss = 0.700307 (* 1 = 0.700307 loss)
I0228 13:33:43.542897  3422 sgd_solver.cpp:105] Iteration 40, lr = 0.0001
I0228 13:36:37.602869  3426 data_layer.cpp:73] Restarting data prefetching from start.
I0228 13:46:57.980952  3422 solver.cpp:331] Iteration 50, Testing net (#0)
I0228 13:50:55.125911  3429 data_layer.cpp:73] Restarting data prefetching from start.
I0228 13:55:22.078013  3429 data_layer.cpp:73] Restarting data prefetching from start.
I0228 13:55:49.644492  3422 solver.cpp:398]     Test net output #0: accuracy = 0.5
I0228 13:55:49.644516  3422 solver.cpp:398]     Test net output #1: loss = 0.693804 (* 1 = 0.693804 loss)
I0228 13:57:19.439967  3422 solver.cpp:219] Iteration 50 (0.00706266 iter/s, 1415.9s/10 iters), loss = 0.685755
I0228 13:57:19.440101  3422 solver.cpp:238]     Train net output #0: loss = 0.685755 (* 1 = 0.685755 loss)
I0228 13:57:19.440107  3422 sgd_solver.cpp:105] Iteration 50, lr = 0.0001
I0228 13:57:19.843221  3426 data_layer.cpp:73] Restarting data prefetching from start.
I0228 14:09:13.012436  3426 data_layer.cpp:73] Restarting data prefetching from start.
I0228 14:10:40.182121  3422 solver.cpp:331] Iteration 60, Testing net (#0)
I0228 14:14:37.148968  3429 data_layer.cpp:73] Restarting data prefetching from start.
I0228 14:18:57.929569  3429 data_layer.cpp:73] Restarting data prefetching from start.
I0228 14:19:24.183915  3422 solver.cpp:398]     Test net output #0: accuracy = 0.5
I0228 14:19:24.183939  3422 solver.cpp:398]     Test net output #1: loss = 0.693612 (* 1 = 0.693612 loss)
I0228 14:20:51.017705  3422 solver.cpp:219] Iteration 60 (0.00708428 iter/s, 1411.58s/10 iters), loss = 0.693453
I0228 14:20:51.017838  3422 solver.cpp:238]     Train net output #0: loss = 0.693453 (* 1 = 0.693453 loss)
I0228 14:20:51.017845  3422 sgd_solver.cpp:105] Iteration 60, lr = 1e-05
I0228 14:29:34.635071  3426 data_layer.cpp:73] Restarting data prefetching from start.
I0228 14:34:02.693697  3422 solver.cpp:331] Iteration 70, Testing net (#0)
I0228 14:37:59.742414  3429 data_layer.cpp:73] Restarting data prefetching from start.

I also tried to change the value of test_iter to 40 (instead of previously set to 80) after following this link and this one if the parameter is related, but it still didn't resolve. Also, I tried to reshuffle the data by regenerating the dataset using a modified create_imagenet.sh script but the issue still remains.

Every time I changed value in the solver, I always changed the fully connected layer's name as well. Is this a correct way?

The number of epoch here is ~10. Is it possible culprit? Does this kind of problem fall under over-fitting issue?

Any hints or suggestions are welcome.

EDITED:

I turned on the debug info in the solver and found the loss is infinitesimal. Can I deduce that it's not learning much or at all then? The log with the debug info is as below:

I0228 19:58:37.235631  6771 net.cpp:593]     [Forward] Layer pool2, top blob pool2 data: 1.00214
I0228 19:58:37.810919  6771 net.cpp:593]     [Forward] Layer norm2, top blob norm2 data: 1.00212
I0228 19:58:42.022397  6771 net.cpp:593]     [Forward] Layer conv3, top blob conv3 data: 0.432846
I0228 19:58:42.022722  6771 net.cpp:605]     [Forward] Layer conv3, param blob 0 data: 0.00796926
I0228 19:58:42.022725  6771 net.cpp:605]     [Forward] Layer conv3, param blob 1 data: 0.000184241
I0228 19:58:42.041185  6771 net.cpp:593]     [Forward] Layer relu3, top blob conv3 data: 0.2017
I0228 19:58:45.277812  6771 net.cpp:593]     [Forward] Layer conv4, top blob conv4 data: 0.989365
I0228 19:58:45.278079  6771 net.cpp:605]     [Forward] Layer conv4, param blob 0 data: 0.00797053
I0228 19:58:45.278082  6771 net.cpp:605]     [Forward] Layer conv4, param blob 1 data: 0.99991
I0228 19:58:45.296561  6771 net.cpp:593]     [Forward] Layer relu4, top blob conv4 data: 0.989365
I0228 19:58:47.495208  6771 net.cpp:593]     [Forward] Layer conv5, top blob conv5 data: 1.52664
I0228 19:58:47.495394  6771 net.cpp:605]     [Forward] Layer conv5, param blob 0 data: 0.00804997
I0228 19:58:47.495399  6771 net.cpp:605]     [Forward] Layer conv5, param blob 1 data: 0.996736
I0228 19:58:47.507951  6771 net.cpp:593]     [Forward] Layer relu5, top blob conv5 data: 0.128866
I0228 19:58:47.562223  6771 net.cpp:593]     [Forward] Layer pool5, top blob pool5 data: 0.151769
I0228 19:58:48.269973  6771 net.cpp:593]     [Forward] Layer fc6, top blob fc6 data: 0.95253
I0228 19:58:48.280905  6771 net.cpp:605]     [Forward] Layer fc6, param blob 0 data: 0.00397552
I0228 19:58:48.280917  6771 net.cpp:605]     [Forward] Layer fc6, param blob 1 data: 0.999847
I0228 19:58:48.282137  6771 net.cpp:593]     [Forward] Layer relu6, top blob fc6 data: 0.935909
I0228 19:58:48.286769  6771 net.cpp:593]     [Forward] Layer drop6, top blob fc6 data: 0.938786
I0228 19:58:48.602710  6771 net.cpp:593]     [Forward] Layer fc7, top blob fc7 data: 3.76741
I0228 19:58:48.607655  6771 net.cpp:605]     [Forward] Layer fc7, param blob 0 data: 0.00411323
I0228 19:58:48.607664  6771 net.cpp:605]     [Forward] Layer fc7, param blob 1 data: 0.997461
I0228 19:58:48.608860  6771 net.cpp:593]     [Forward] Layer relu7, top blob fc7 data: 3.41694e-06
I0228 19:58:48.613621  6771 net.cpp:593]     [Forward] Layer drop7, top blob fc7 data: 3.15335e-06
I0228 19:58:48.615514  6771 net.cpp:593]     [Forward] Layer fc8_new15, top blob fc8_new15 data: 0.0446082
I0228 19:58:48.615520  6771 net.cpp:605]     [Forward] Layer fc8_new15, param blob 0 data: 0.0229027
I0228 19:58:48.615522  6771 net.cpp:605]     [Forward] Layer fc8_new15, param blob 1 data: 0.0444381
I0228 19:58:48.615579  6771 net.cpp:593]     [Forward] Layer loss, top blob loss data: 0.693174
I0228 19:58:48.615586  6771 net.cpp:621]     [Backward] Layer loss, bottom blob fc8_new15 diff: 0.00195124
I0228 19:58:48.617902  6771 net.cpp:621]     [Backward] Layer fc8_new15, bottom blob fc7 diff: 8.65365e-05
I0228 19:58:48.617914  6771 net.cpp:632]     [Backward] Layer fc8_new15, param blob 0 diff: 8.20022e-07
I0228 19:58:48.617916  6771 net.cpp:632]     [Backward] Layer fc8_new15, param blob 1 diff: 0.0105705
I0228 19:58:48.619067  6771 net.cpp:621]     [Backward] Layer drop7, bottom blob fc7 diff: 8.65526e-05
I0228 19:58:48.620265  6771 net.cpp:621]     [Backward] Layer relu7, bottom blob fc7 diff: 1.21017e-09
I0228 19:58:49.261282  6771 net.cpp:621]     [Backward] Layer fc7, bottom blob fc6 diff: 2.00745e-08
I0228 19:58:49.266103  6771 net.cpp:632]     [Backward] Layer fc7, param blob 0 diff: 1.43563e-07
I0228 19:58:49.266114  6771 net.cpp:632]     [Backward] Layer fc7, param blob 1 diff: 9.29627e-08
I0228 19:58:49.267330  6771 net.cpp:621]     [Backward] Layer drop6, bottom blob fc6 diff: 1.99176e-08
I0228 19:58:49.268508  6771 net.cpp:621]     [Backward] Layer relu6, bottom blob fc6 diff: 1.85305e-08
I0228 19:58:50.779518  6771 net.cpp:621]     [Backward] Layer fc6, bottom blob pool5 diff: 8.8138e-09
I0228 19:58:50.790220  6771 net.cpp:632]     [Backward] Layer fc6, param blob 0 diff: 3.01911e-07
I0228 19:58:50.790235  6771 net.cpp:632]     [Backward] Layer fc6, param blob 1 diff: 1.99256e-06
I0228 19:58:50.813318  6771 net.cpp:621]     [Backward] Layer pool5, bottom blob conv5 diff: 1.84585e-09
I0228 19:58:50.826406  6771 net.cpp:621]     [Backward] Layer relu5, bottom blob conv5 diff: 3.86034e-10
I0228 19:58:55.093768  6771 net.cpp:621]     [Backward] Layer conv5, bottom blob conv4 diff: 5.76684e-10
I0228 19:58:55.093967  6771 net.cpp:632]     [Backward] Layer conv5, param blob 0 diff: 1.47824e-06
I0228 19:58:55.093973  6771 net.cpp:632]     [Backward] Layer conv5, param blob 1 diff: 1.92951e-06
I0228 19:58:55.114212  6771 net.cpp:621]     [Backward] Layer relu4, bottom blob conv4 diff: 5.76684e-10
I0228 19:59:01.392058  6771 net.cpp:621]     [Backward] Layer conv4, bottom blob conv3 diff: 2.31243e-10
I0228 19:59:01.392359  6771 net.cpp:632]     [Backward] Layer conv4, param blob 0 diff: 1.76617e-07
I0228 19:59:01.392364  6771 net.cpp:632]     [Backward] Layer conv4, param blob 1 diff: 8.78101e-07
I0228 19:59:01.412240  6771 net.cpp:621]     [Backward] Layer relu3, bottom blob conv3 diff: 8.56331e-11
I0228 19:59:09.734658  6771 net.cpp:621]     [Backward] Layer conv3, bottom blob norm2 diff: 7.87699e-11
I0228 19:59:09.735258  6771 net.cpp:632]     [Backward] Layer conv3, param blob 0 diff: 1.33159e-07
I0228 19:59:09.735270  6771 net.cpp:632]     [Backward] Layer conv3, param blob 1 diff: 1.47704e-07
I0228 19:59:10.390552  6771 net.cpp:621]     [Backward] Layer norm2, bottom blob pool2 diff: 7.87615e-11
I0228 19:59:10.452433  6771 net.cpp:621]     [Backward] Layer pool2, bottom blob conv2 diff: 1.50474e-11
I0228 19:59:10.516407  6771 net.cpp:621]     [Backward] Layer relu2, bottom blob conv2 diff: 1.50474e-11
I0228 19:59:20.241587  6771 net.cpp:621]     [Backward] Layer conv2, bottom blob norm1 diff: 2.07819e-11
I0228 19:59:20.241801  6771 net.cpp:632]     [Backward] Layer conv2, param blob 0 diff: 3.61894e-09
I0228 19:59:20.241807  6771 net.cpp:632]     [Backward] Layer conv2, param blob 1 diff: 1.05108e-07
I0228 19:59:35.405725  6771 net.cpp:621]     [Backward] Layer norm1, bottom blob pool1 diff: 2.07819e-11
I0228 19:59:35.494249  6771 net.cpp:621]     [Backward] Layer pool1, bottom blob conv1 diff: 4.26e-12
I0228 19:59:35.585350  6771 net.cpp:621]     [Backward] Layer relu1, bottom blob conv1 diff: 3.25633e-12
I0228 19:59:38.335880  6771 net.cpp:632]     [Backward] Layer conv1, param blob 0 diff: 9.37551e-09
I0228 19:59:38.335896  6771 net.cpp:632]     [Backward] Layer conv1, param blob 1 diff: 5.86281e-08
E0228 19:59:38.411557  6771 net.cpp:721]     [Backward] All net params (data, diff): L1 norm = (246967, 14.733); L2 norm = (103.38, 0.0470958)
I0228 19:59:38.411592  6771 solver.cpp:219] Iteration 70 (0.00886075 iter/s, 1128.57s/10 iters), loss = 0.693174
I0228 19:59:38.411600  6771 solver.cpp:238]     Train net output #0: loss = 0.693174 (* 1 = 0.693174 loss)
I0228 19:59:38.411605  6771 sgd_solver.cpp:105] Iteration 70, lr = 1e-05
I0228 20:05:17.468423  6775 data_layer.cpp:73] Restarting data prefetching from start.

回答1:

data_layer.cpp:73] Restarting data prefetching from start.

The above message occurs when the .txt file that is given as input to data layer reached the end of file.

This message can occur frequently when:

  1. You gave the wrong .txt file to data layer
  2. The format of the .txt file is not as expected by Caffe
  3. Very few number of data is present in the file.