I wonder why my test accuracy keeps on getting a constant value of 0.5. I use CaffeNet network with only change in the fully connected layer's parameter where I configured num_output: 2.
My training set contains 1000 positive and 1000 negative examples whereas my validation set has 1000 positive and 1000 negative examples as well. The dataset contains images of person (whole body RGB colored). I've defined a mean file and scale value in the data layer. My network is trained to learn a person or not (binary classifier).
A snippet of my solver information looks like below:
test_iter: 80
test_interval: 10
base_lr: 0.01
lr_policy: "step"
gamma: 0.1
stepsize: 20
display: 10
max_iter: 80
momentum: 0.9
weight_decay: 0.0005
snapshot: 10000
The training output is as follows:
I0228 11:49:27.411556 3422 solver.cpp:274] Learning Rate Policy: step
I0228 11:49:27.590368 3422 solver.cpp:331] Iteration 0, Testing net (#0)
I0228 11:53:29.203058 3429 data_layer.cpp:73] Restarting data prefetching from start.
I0228 11:57:59.969632 3429 data_layer.cpp:73] Restarting data prefetching from start.
I0228 11:58:26.602972 3422 solver.cpp:398] Test net output #0: accuracy = 0.5
I0228 11:58:26.602999 3422 solver.cpp:398] Test net output #1: loss = 0.726503 (* 1 = 0.726503 loss)
I0228 12:00:03.892771 3422 solver.cpp:219] Iteration 0 (-6.49109e-41 iter/s, 636.481s/10 iters), loss = 0.961699
I0228 12:00:03.892915 3422 solver.cpp:238] Train net output #0: loss = 0.961699 (* 1 = 0.961699 loss)
I0228 12:00:03.892925 3422 sgd_solver.cpp:105] Iteration 0, lr = 0.01
I0228 12:04:28.831887 3426 data_layer.cpp:73] Restarting data prefetching from start.
I0228 12:13:36.909935 3422 solver.cpp:331] Iteration 10, Testing net (#0)
I0228 12:17:36.894516 3429 data_layer.cpp:73] Restarting data prefetching from start.
I0228 12:22:00.724030 3429 data_layer.cpp:73] Restarting data prefetching from start.
I0228 12:22:27.375306 3422 solver.cpp:398] Test net output #0: accuracy = 0.5
I0228 12:22:27.375334 3422 solver.cpp:398] Test net output #1: loss = 0.698973 (* 1 = 0.698973 loss)
I0228 12:23:56.072116 3422 solver.cpp:219] Iteration 10 (0.00698237 iter/s, 1432.18s/10 iters), loss = 0.696559
I0228 12:23:56.072247 3422 solver.cpp:238] Train net output #0: loss = 0.696558 (* 1 = 0.696558 loss)
I0228 12:23:56.072252 3422 sgd_solver.cpp:105] Iteration 10, lr = 0.01
I0228 12:25:23.664594 3426 data_layer.cpp:73] Restarting data prefetching from start.
I0228 12:37:08.202978 3422 solver.cpp:331] Iteration 20, Testing net (#0)
I0228 12:41:05.859966 3429 data_layer.cpp:73] Restarting data prefetching from start.
I0228 12:45:28.599306 3429 data_layer.cpp:73] Restarting data prefetching from start.
I0228 12:45:55.524168 3422 solver.cpp:398] Test net output #0: accuracy = 0.5
I0228 12:45:55.524190 3422 solver.cpp:398] Test net output #1: loss = 0.693187 (* 1 = 0.693187 loss)
I0228 12:45:55.553427 3426 data_layer.cpp:73] Restarting data prefetching from start.
I0228 12:47:24.159780 3422 solver.cpp:219] Iteration 20 (0.00710183 iter/s, 1408.09s/10 iters), loss = 0.690313
I0228 12:47:24.159914 3422 solver.cpp:238] Train net output #0: loss = 0.690313 (* 1 = 0.690313 loss)
I0228 12:47:24.159920 3422 sgd_solver.cpp:105] Iteration 20, lr = 0.001
I0228 12:57:31.167225 3426 data_layer.cpp:73] Restarting data prefetching from start.
I0228 13:00:23.671567 3422 solver.cpp:331] Iteration 30, Testing net (#0)
I0228 13:04:14.114737 3429 data_layer.cpp:73] Restarting data prefetching from start.
I0228 13:08:30.406244 3429 data_layer.cpp:73] Restarting data prefetching from start.
I0228 13:08:56.273648 3422 solver.cpp:398] Test net output #0: accuracy = 0.5
I0228 13:08:56.273674 3422 solver.cpp:398] Test net output #1: loss = 0.696971 (* 1 = 0.696971 loss)
I0228 13:10:28.487870 3422 solver.cpp:219] Iteration 30 (0.00722373 iter/s, 1384.33s/10 iters), loss = 0.700565
I0228 13:10:28.488041 3422 solver.cpp:238] Train net output #0: loss = 0.700565 (* 1 = 0.700565 loss)
I0228 13:10:28.488049 3422 sgd_solver.cpp:105] Iteration 30, lr = 0.001
I0228 13:17:38.463490 3426 data_layer.cpp:73] Restarting data prefetching from start.
I0228 13:23:29.700287 3422 solver.cpp:331] Iteration 40, Testing net (#0)
I0228 13:27:27.217670 3429 data_layer.cpp:73] Restarting data prefetching from start.
I0228 13:31:48.651156 3429 data_layer.cpp:73] Restarting data prefetching from start.
I0228 13:32:15.021637 3422 solver.cpp:398] Test net output #0: accuracy = 0.5
I0228 13:32:15.021661 3422 solver.cpp:398] Test net output #1: loss = 0.694784 (* 1 = 0.694784 loss)
I0228 13:33:43.542735 3422 solver.cpp:219] Iteration 40 (0.00716818 iter/s, 1395.05s/10 iters), loss = 0.700307
I0228 13:33:43.542875 3422 solver.cpp:238] Train net output #0: loss = 0.700307 (* 1 = 0.700307 loss)
I0228 13:33:43.542897 3422 sgd_solver.cpp:105] Iteration 40, lr = 0.0001
I0228 13:36:37.602869 3426 data_layer.cpp:73] Restarting data prefetching from start.
I0228 13:46:57.980952 3422 solver.cpp:331] Iteration 50, Testing net (#0)
I0228 13:50:55.125911 3429 data_layer.cpp:73] Restarting data prefetching from start.
I0228 13:55:22.078013 3429 data_layer.cpp:73] Restarting data prefetching from start.
I0228 13:55:49.644492 3422 solver.cpp:398] Test net output #0: accuracy = 0.5
I0228 13:55:49.644516 3422 solver.cpp:398] Test net output #1: loss = 0.693804 (* 1 = 0.693804 loss)
I0228 13:57:19.439967 3422 solver.cpp:219] Iteration 50 (0.00706266 iter/s, 1415.9s/10 iters), loss = 0.685755
I0228 13:57:19.440101 3422 solver.cpp:238] Train net output #0: loss = 0.685755 (* 1 = 0.685755 loss)
I0228 13:57:19.440107 3422 sgd_solver.cpp:105] Iteration 50, lr = 0.0001
I0228 13:57:19.843221 3426 data_layer.cpp:73] Restarting data prefetching from start.
I0228 14:09:13.012436 3426 data_layer.cpp:73] Restarting data prefetching from start.
I0228 14:10:40.182121 3422 solver.cpp:331] Iteration 60, Testing net (#0)
I0228 14:14:37.148968 3429 data_layer.cpp:73] Restarting data prefetching from start.
I0228 14:18:57.929569 3429 data_layer.cpp:73] Restarting data prefetching from start.
I0228 14:19:24.183915 3422 solver.cpp:398] Test net output #0: accuracy = 0.5
I0228 14:19:24.183939 3422 solver.cpp:398] Test net output #1: loss = 0.693612 (* 1 = 0.693612 loss)
I0228 14:20:51.017705 3422 solver.cpp:219] Iteration 60 (0.00708428 iter/s, 1411.58s/10 iters), loss = 0.693453
I0228 14:20:51.017838 3422 solver.cpp:238] Train net output #0: loss = 0.693453 (* 1 = 0.693453 loss)
I0228 14:20:51.017845 3422 sgd_solver.cpp:105] Iteration 60, lr = 1e-05
I0228 14:29:34.635071 3426 data_layer.cpp:73] Restarting data prefetching from start.
I0228 14:34:02.693697 3422 solver.cpp:331] Iteration 70, Testing net (#0)
I0228 14:37:59.742414 3429 data_layer.cpp:73] Restarting data prefetching from start.
I also tried to change the value of test_iter to 40 (instead of previously set to 80) after following this link and this one if the parameter is related, but it still didn't resolve. Also, I tried to reshuffle the data by regenerating the dataset using a modified create_imagenet.sh script but the issue still remains.
Every time I changed value in the solver, I always changed the fully connected layer's name as well. Is this a correct way?
The number of epoch here is ~10. Is it possible culprit? Does this kind of problem fall under over-fitting issue?
Any hints or suggestions are welcome.
EDITED:
I turned on the debug info in the solver and found the loss is infinitesimal. Can I deduce that it's not learning much or at all then? The log with the debug info is as below:
I0228 19:58:37.235631 6771 net.cpp:593] [Forward] Layer pool2, top blob pool2 data: 1.00214
I0228 19:58:37.810919 6771 net.cpp:593] [Forward] Layer norm2, top blob norm2 data: 1.00212
I0228 19:58:42.022397 6771 net.cpp:593] [Forward] Layer conv3, top blob conv3 data: 0.432846
I0228 19:58:42.022722 6771 net.cpp:605] [Forward] Layer conv3, param blob 0 data: 0.00796926
I0228 19:58:42.022725 6771 net.cpp:605] [Forward] Layer conv3, param blob 1 data: 0.000184241
I0228 19:58:42.041185 6771 net.cpp:593] [Forward] Layer relu3, top blob conv3 data: 0.2017
I0228 19:58:45.277812 6771 net.cpp:593] [Forward] Layer conv4, top blob conv4 data: 0.989365
I0228 19:58:45.278079 6771 net.cpp:605] [Forward] Layer conv4, param blob 0 data: 0.00797053
I0228 19:58:45.278082 6771 net.cpp:605] [Forward] Layer conv4, param blob 1 data: 0.99991
I0228 19:58:45.296561 6771 net.cpp:593] [Forward] Layer relu4, top blob conv4 data: 0.989365
I0228 19:58:47.495208 6771 net.cpp:593] [Forward] Layer conv5, top blob conv5 data: 1.52664
I0228 19:58:47.495394 6771 net.cpp:605] [Forward] Layer conv5, param blob 0 data: 0.00804997
I0228 19:58:47.495399 6771 net.cpp:605] [Forward] Layer conv5, param blob 1 data: 0.996736
I0228 19:58:47.507951 6771 net.cpp:593] [Forward] Layer relu5, top blob conv5 data: 0.128866
I0228 19:58:47.562223 6771 net.cpp:593] [Forward] Layer pool5, top blob pool5 data: 0.151769
I0228 19:58:48.269973 6771 net.cpp:593] [Forward] Layer fc6, top blob fc6 data: 0.95253
I0228 19:58:48.280905 6771 net.cpp:605] [Forward] Layer fc6, param blob 0 data: 0.00397552
I0228 19:58:48.280917 6771 net.cpp:605] [Forward] Layer fc6, param blob 1 data: 0.999847
I0228 19:58:48.282137 6771 net.cpp:593] [Forward] Layer relu6, top blob fc6 data: 0.935909
I0228 19:58:48.286769 6771 net.cpp:593] [Forward] Layer drop6, top blob fc6 data: 0.938786
I0228 19:58:48.602710 6771 net.cpp:593] [Forward] Layer fc7, top blob fc7 data: 3.76741
I0228 19:58:48.607655 6771 net.cpp:605] [Forward] Layer fc7, param blob 0 data: 0.00411323
I0228 19:58:48.607664 6771 net.cpp:605] [Forward] Layer fc7, param blob 1 data: 0.997461
I0228 19:58:48.608860 6771 net.cpp:593] [Forward] Layer relu7, top blob fc7 data: 3.41694e-06
I0228 19:58:48.613621 6771 net.cpp:593] [Forward] Layer drop7, top blob fc7 data: 3.15335e-06
I0228 19:58:48.615514 6771 net.cpp:593] [Forward] Layer fc8_new15, top blob fc8_new15 data: 0.0446082
I0228 19:58:48.615520 6771 net.cpp:605] [Forward] Layer fc8_new15, param blob 0 data: 0.0229027
I0228 19:58:48.615522 6771 net.cpp:605] [Forward] Layer fc8_new15, param blob 1 data: 0.0444381
I0228 19:58:48.615579 6771 net.cpp:593] [Forward] Layer loss, top blob loss data: 0.693174
I0228 19:58:48.615586 6771 net.cpp:621] [Backward] Layer loss, bottom blob fc8_new15 diff: 0.00195124
I0228 19:58:48.617902 6771 net.cpp:621] [Backward] Layer fc8_new15, bottom blob fc7 diff: 8.65365e-05
I0228 19:58:48.617914 6771 net.cpp:632] [Backward] Layer fc8_new15, param blob 0 diff: 8.20022e-07
I0228 19:58:48.617916 6771 net.cpp:632] [Backward] Layer fc8_new15, param blob 1 diff: 0.0105705
I0228 19:58:48.619067 6771 net.cpp:621] [Backward] Layer drop7, bottom blob fc7 diff: 8.65526e-05
I0228 19:58:48.620265 6771 net.cpp:621] [Backward] Layer relu7, bottom blob fc7 diff: 1.21017e-09
I0228 19:58:49.261282 6771 net.cpp:621] [Backward] Layer fc7, bottom blob fc6 diff: 2.00745e-08
I0228 19:58:49.266103 6771 net.cpp:632] [Backward] Layer fc7, param blob 0 diff: 1.43563e-07
I0228 19:58:49.266114 6771 net.cpp:632] [Backward] Layer fc7, param blob 1 diff: 9.29627e-08
I0228 19:58:49.267330 6771 net.cpp:621] [Backward] Layer drop6, bottom blob fc6 diff: 1.99176e-08
I0228 19:58:49.268508 6771 net.cpp:621] [Backward] Layer relu6, bottom blob fc6 diff: 1.85305e-08
I0228 19:58:50.779518 6771 net.cpp:621] [Backward] Layer fc6, bottom blob pool5 diff: 8.8138e-09
I0228 19:58:50.790220 6771 net.cpp:632] [Backward] Layer fc6, param blob 0 diff: 3.01911e-07
I0228 19:58:50.790235 6771 net.cpp:632] [Backward] Layer fc6, param blob 1 diff: 1.99256e-06
I0228 19:58:50.813318 6771 net.cpp:621] [Backward] Layer pool5, bottom blob conv5 diff: 1.84585e-09
I0228 19:58:50.826406 6771 net.cpp:621] [Backward] Layer relu5, bottom blob conv5 diff: 3.86034e-10
I0228 19:58:55.093768 6771 net.cpp:621] [Backward] Layer conv5, bottom blob conv4 diff: 5.76684e-10
I0228 19:58:55.093967 6771 net.cpp:632] [Backward] Layer conv5, param blob 0 diff: 1.47824e-06
I0228 19:58:55.093973 6771 net.cpp:632] [Backward] Layer conv5, param blob 1 diff: 1.92951e-06
I0228 19:58:55.114212 6771 net.cpp:621] [Backward] Layer relu4, bottom blob conv4 diff: 5.76684e-10
I0228 19:59:01.392058 6771 net.cpp:621] [Backward] Layer conv4, bottom blob conv3 diff: 2.31243e-10
I0228 19:59:01.392359 6771 net.cpp:632] [Backward] Layer conv4, param blob 0 diff: 1.76617e-07
I0228 19:59:01.392364 6771 net.cpp:632] [Backward] Layer conv4, param blob 1 diff: 8.78101e-07
I0228 19:59:01.412240 6771 net.cpp:621] [Backward] Layer relu3, bottom blob conv3 diff: 8.56331e-11
I0228 19:59:09.734658 6771 net.cpp:621] [Backward] Layer conv3, bottom blob norm2 diff: 7.87699e-11
I0228 19:59:09.735258 6771 net.cpp:632] [Backward] Layer conv3, param blob 0 diff: 1.33159e-07
I0228 19:59:09.735270 6771 net.cpp:632] [Backward] Layer conv3, param blob 1 diff: 1.47704e-07
I0228 19:59:10.390552 6771 net.cpp:621] [Backward] Layer norm2, bottom blob pool2 diff: 7.87615e-11
I0228 19:59:10.452433 6771 net.cpp:621] [Backward] Layer pool2, bottom blob conv2 diff: 1.50474e-11
I0228 19:59:10.516407 6771 net.cpp:621] [Backward] Layer relu2, bottom blob conv2 diff: 1.50474e-11
I0228 19:59:20.241587 6771 net.cpp:621] [Backward] Layer conv2, bottom blob norm1 diff: 2.07819e-11
I0228 19:59:20.241801 6771 net.cpp:632] [Backward] Layer conv2, param blob 0 diff: 3.61894e-09
I0228 19:59:20.241807 6771 net.cpp:632] [Backward] Layer conv2, param blob 1 diff: 1.05108e-07
I0228 19:59:35.405725 6771 net.cpp:621] [Backward] Layer norm1, bottom blob pool1 diff: 2.07819e-11
I0228 19:59:35.494249 6771 net.cpp:621] [Backward] Layer pool1, bottom blob conv1 diff: 4.26e-12
I0228 19:59:35.585350 6771 net.cpp:621] [Backward] Layer relu1, bottom blob conv1 diff: 3.25633e-12
I0228 19:59:38.335880 6771 net.cpp:632] [Backward] Layer conv1, param blob 0 diff: 9.37551e-09
I0228 19:59:38.335896 6771 net.cpp:632] [Backward] Layer conv1, param blob 1 diff: 5.86281e-08
E0228 19:59:38.411557 6771 net.cpp:721] [Backward] All net params (data, diff): L1 norm = (246967, 14.733); L2 norm = (103.38, 0.0470958)
I0228 19:59:38.411592 6771 solver.cpp:219] Iteration 70 (0.00886075 iter/s, 1128.57s/10 iters), loss = 0.693174
I0228 19:59:38.411600 6771 solver.cpp:238] Train net output #0: loss = 0.693174 (* 1 = 0.693174 loss)
I0228 19:59:38.411605 6771 sgd_solver.cpp:105] Iteration 70, lr = 1e-05
I0228 20:05:17.468423 6775 data_layer.cpp:73] Restarting data prefetching from start.