I would like to use the yolo architecture for object detection. Before training the network with my custom data, I followed these steps to train it on the Pascal VOC data: https://pjreddie.com/darknet/yolo/
The instructions are very clear. But after the final step
./darknet detector train cfg/voc.data cfg/yolo-voc.cfg darknet19_448.conv.23
darknet immediately stops training and announces that weights have been written to the backups/
directory.
At first I thought that the pretraining was simply too good and that the stopping criteria would be reached at once.
So I've used the ./darknet detect
command with these weights on one of the test images data/dog
. Nothing is found.
If I don't use any pretrained weights, the network does train. I've edited cfg/yolo-voc.cfg to use
# Testing
#batch=1
#subdivisions=1
# Training
batch=32
subdivisions=8
Now the training process has been runnning for many hours and is keeping my gpu warm.
Is this the intended way to train darknet ? How can I use pretrained weights correctly, without training just breaking off ?
Is there any setting to create checkpoints, or get an idea of the progress ?
This is an old question so I hope you have your answer by now, but here is mine just in case it helps.
After working with darknet for about a month, I've run into most of the roadblocks that people have asked/posted about on forums. In your case, I'm pretty certain it's because the weights have been trained for the max number of batches already, and when the pre-trained weights were read in darknet assumed training was done.
Relevant personal experience: when I used one of the pretrained weights files, it started from iteration 40101 and ran until 40200 before cutting off.
I would stick to training from scratch if you have custom data, but if you want to try the pre-trained weights again, you might find that changing max batches in the cfg file helps.
Adding
-clear 1
at the end of your training command will clear the stats of how many images this model has seen in previous training. Then you can fine-tune your model on new data(set).You can find more info about the usage in the function signature
void train_detector(char *datacfg, char *cfgfile, char *weightfile, int *gpus, int ngpus, int clear)
at https://github.com/pjreddie/darknet/blob/b13f67bfdd87434e141af532cdb5dc1b8369aa3b/examples/detector.cI doubt it that increasing the max number of iterations is a good idea, as the learning rates are usually associated with current # of iteration. We usually increase the max # of iterations, when we want to resume a previous training task that ended because of reaching the max # of iterations, but we believe that with more iterations, it will give better results.
FYI, when you have a small dataset, training on it from scratch or from a classification network may not be a great idea. You may still want to re-use the weights from a detection network trained on large dataset like Coco or ImageNet.
Also if using AlexeyAB/darknet they might have a problem with -clear option, in detector.c:
should really be:
otherwise the training loop will exit immediately.