Is there a rule of thumb for how big the number of samples should be for the label that represents "everything else" in a multi-class classification task?
Example: I want to classify my input as being one of X
classes. The X + 1
class activates when the input is "none of the above." Suppose my dataset contains 5,000 samples from each of the 10 "positive" classes. For samples representing the "unknown" class, I'd use multiple realistic examples likely to be found in production, but that are not from the other classes.
How big should the number of these negative examples be relative to the other distributions?
This is maybe a bit off-topic, but in any case, I don't think there is a general rule of thumb, it depends on your problem and your approach.
I would consider the following factors:
Unfortunately, the only good way of telling whether you are doing ok is experimenting and having good metrics over a representative test dataset (confusion matrix, per-class precision/recall, etc).