I am doing binary classification with Keras
loss='binary_crossentropy'
, optimizer=tf.keras.optimizers.Adam
and final layer is keras.layers.Dense(1, activation=tf.nn.sigmoid)
.
As I know, loss
value is used to evaluate the model during training phase. However, when I use Keras
model evaluation for my testing dataset (e.g. m_recall.evaluate(testData,testLabel)
, there are also loss
values, accompanied by accuracy
values like the output below
test size: (1889, 18525)
1889/1889 [==============================] - 1s 345us/step
m_acc: [0.5690245978371045, 0.9523557437797776]
1889/1889 [==============================] - 1s 352us/step
m_recall: [0.24519687695911097, 0.9359449444150344]
1889/1889 [==============================] - 1s 350us/step
m_f1: [0.502442331737344, 0.9216516675489677]
1889/1889 [==============================] - 1s 360us/step
metric name: ['loss', 'acc']
What is the meaning/usage of loss
during testing? Why it is so high (e.g. 0.5690
in m_acc
)? The accuracy evaluation seems fine to me (e.g. 0.9523
in m_acc
) but I am concerned about the loss
too, does it make my model perform badly?
P.S.
m_acc
, m_recall
, etc. are just the way I name my models (they were trained by on different metrics in GridSearchCV
)
Update:
I just realized that loss
values are not in percentage, so how are they calculated? And with current values, are they good enough or do I need to optimize them more?
Suggestions for further reading are appreciated too!
When defining a machine learning model, we want a way to measure the performance of our model so that we could compare it with other models to choose the best one and also make sure that it is good enough. Therefore, we define some metrics like accuracy (in the context of classification), which is the proportion of correctly classified samples by the model, to measure how our model performs and whether it is good enough for our task or not.
Although these metrics are truly comprehensible by us, however the problem is that they cannot be directly used by the learning process of our models to tune the parameters of the model. Instead, we define other measures, which are usually called loss functions or objective functions, which can be directly used by the training process (i.e. optimization). These functions are usually defined such that we expect that when their values are low we would have a high accuracy. That's why you would commonly see that the machine learning algorithms are trying to minimize a loss function with the expectation that the accuracy increases. In other words, the models are indirectly learning by optimizing the loss functions. The loss values are important during training of the model, e.g. if they are not decreasing or fluctuating then this means there is a problem somewhere that needs to be fixed.
As a result, what we are ultimately (i.e. when testing a model) concerned about is the value of metrics (like accuracy) we have initially defined and we don't care about the final value of loss functions. That's why you don't hear things like "the loss value of a [specific model] on the ImageNet dataset is 8.732"! That does not tell you anything whether the model is great, good, bad or terrible. Rather, you would hear that "this model performs with 87% accuracy on the ImageNet dataset".