This is a three part question
1) Class size - i'm training the TF object detection API on 5 classes, where sizes aren't anywhere close to each other:
- No. of images in class1: 401
- No. of images in class2: 389
- No. of images in class3: 532
- No. of images in class4: 159393
- No. of images in class5: 185313
(total
This isn't training a typical image classifier so I'm guessing this isn't really an issue of class imbalance, but im wondering if it would affect the outcome model
2) Can TF object detection API be used to detect two objects where 1 is enclosed / bounded by the other?
Ex. face vs person - face is within the bounds of the person
3) This is a continuation where I found that using Faster RCNN means batch_size has to be set to 1.
And because of this, I am not sure if this means that I have to wait for global step during training to match the # of images in the training set (approx 340k in my custom data set). I am using Tesla k80 GPU w/12 GB memory on Google compute w/4 vCPU and 15gig RAM. After about 2 days, i see loss hitting well below 1 though:
INFO:tensorflow:global step 264250: loss = 0.2799 (0.755 sec/step)
INFO:tensorflow:global step 264251: loss = 0.0271 (0.787 sec/step)
INFO:tensorflow:global step 264252: loss = 0.1122 (0.677 sec/step)
INFO:tensorflow:global step 264253: loss = 0.1709 (0.797 sec/step)
INFO:tensorflow:global step 264254: loss = 0.8366 (0.790 sec/step)
INFO:tensorflow:global step 264255: loss = 0.0541 (0.741 sec/step)
INFO:tensorflow:global step 264256: loss = 0.0760 (0.781 sec/step)
INFO:tensorflow:global step 264257: loss = 0.0621 (0.777 sec/step)
How do I determine when to stop? I noticed even until here, the frozen inference graph that I generate from the latest checkpoint file ONLY seems to detect the class w/ the most number of images (i.e. face) and doesn't detect anything else.
1) Yes, it will affect the outcome in some way. More precisely, your model will be very good at recognising class 5 and class 4, and it may have an idea about the others. Consider limiting the number of instances of [4, 5] to be at least in the same order of magnitude as the other classes. This would be useful especially in the beginning, so it makes a balanced representation of each class.
Also very important here is to use data augmentation (see this answer).
3) Normally, your model should take several epochs to train well, especially when you have data augmentation.
This is written everywhere on SO and on the issues in the repository: you cannot know if it converged from the values of the loss alone ! . Consider this scenario: you have
shuffle: True
for your input images, 344,706 images in classes 4 and 5. If the shuffle arranged them so that these images came before those from classes [1,2,3], then your model learnt some good representation so far, but when it will encounter an image of class 1 if will overshoot, because of overfitting. So your loss will jump to some very high value.The solution is to run
eval.py
in parallel, as that gives you an idea of how the model performs on all classes. And you can stop when you're sattisfied with that metric.Note it is normal on StackOverflow to ask separate questions if they address different subjects because we are answering for you but also for all the future people in your current position.
So I'll answer 2) in a different one :)