I'm doing the Toxic Comment Text Classification Kaggle challenge. There are 6 classes: ['threat', 'severe_toxic', 'obscene', 'insult', 'identity_hate', 'toxic']
. A comment can be multiple of these classes so it's a multi-label classification problem.
I built a basic neural network with Keras as follows:
model = Sequential()
model.add(Embedding(10000, 128, input_length=250))
model.add(Flatten())
model.add(Dense(100, activation='relu'))
model.add(Dense(len(classes), activation='sigmoid'))
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
I run this line:
model.fit(X_train, train_y, validation_split=0.5, epochs=3)
and get 99.11% accuracy after 3 epochs.
However, 99.11% accuracy is a good bit higher than the best Kaggle submission. This makes me think I'm either (possibly both) a) overfitting or b) misusing Keras's accuracy.
1) Seems a bit hard to overfit when I'm using 50% of my data as a validation split and only 3 epochs.
2) Is accuracy here just the percentage of the time the model gets each class correct?
So if I output [0, 0, 0, 0, 0, 1]
and the correct output was [0, 0, 0, 0, 0, 0]
, my accuracy would be 5/6
?
After a bit of thought, I sort of think the accuracy
metric here is just looking at the class my model predicts with highest confidence and comparing vs. ground truth.
So if my model outputs [0, 0, 0.9, 0, 0, 0]
, it will compare the class at index 2 ('obscene') with the true value. Do you think this is what's happening?
Thanks for any help you can offer!
For multi-label classification, I think it is correct to use
sigmoid
as the activation andbinary_crossentropy
as the loss.If the output is sparse multi-label, meaning a few positive labels and a majority are negative labels, the Keras
accuracy
metric will be overflatted by the correctly predicted negative labels. If I remember correctly, Keras does not choose the label with the highest probability. Instead, for binary classification, the threshold is 50%. So the prediction would be[0, 0, 0, 0, 0, 1]
. And if the actual labels were[0, 0, 0, 0, 0, 0]
, the accuracy would be5/6
. You can test this hypothesis by creating a model that always predicts negative label and look at the accuracy.If that's indeed the case, you may try a different metric such as top_k_categorical_accuracy.
Another remote possibility I can think of is your training data. Are the labels y somehow "leaked" into x? Just a wild guess.
You can refer to Keras Metrics documentation to see all metrics available (e.g. binary_accuracy). You can also create your own custom metric (and make sure it does exactly what you expect). I wanted to make sure neurite was right about how the accuracy is computed, so this is what I did (note:
activation="sigmoid"
) :Running the training you will see that the
custom_acc
is always equal to thebinary_accuracy
(and therefore to thecustom_acc
).Now you can refer to the Keras code on Github to see how it is computed:
Which confirm what neurite said (i.e. If the prediction is
[0, 0, 0, 0, 0, 1]
and the actual labels were[0, 0, 0, 0, 0, 0]
, the accuracy would be5/6
).