If I correctly understood the significance of the loss function to the model, it directs the model to be trained based on minimizing the loss value. So for example, if I want my model to be trained in order to have the least mean absolute error, i should use the MAE as the loss function. Why is it, for example, sometimes you see someone wanting to achieve the best accuracy possible, but building the model to minimize another completely different function? For example:
model.compile(loss='mean_squared_error', optimizer='sgd', metrics='acc')
How come the model above is trained to give us the best acc, since during it's training it will try to minimize another function (MSE). I know that, when already trained, the metric of the model will give us the best acc found during the training.
My doubt is: shouldn't the focus of the model during it's training to maximize acc (or minimize 1/acc) instead of minimizing MSE? If done in that way, wouldn't the model give us even higher accuracy, since it knows it has to maximize it during it's training?
To start with, the code snippet you have used as example:
model.compile(loss='mean_squared_error', optimizer='sgd', metrics='acc')
is actually invalid (although Keras will not produce any error or warning) for a very simple and elementary reason: MSE is a valid loss for regression problems, for which problems accuracy is meaningless (it is meaningful only for classification problems, where MSE is not a valid loss function). For details (including a code example), see own answer in What function defines accuracy in Keras when the loss is mean squared error (MSE)?; for a similar situation in scikit-learn, see own answer in this thread.
Continuing to your general question: in regression settings, usually we don't need a separate performance metric, and we normally use just the loss function itself for this purpose, i.e. the correct code for the example you have used would simply be
model.compile(loss='mean_squared_error', optimizer='sgd')
without any metrics
specified. We could of course use metrics='mse'
, but this is redundant and not really needed. Sometimes people use something like
model.compile(loss='mean_squared_error', optimizer='sgd', metrics=['mse','mae'])
i.e. optimise the model according to the MSE loss, but show also its performance in the mean absolute error (MAE) in addition to MSE.
Now, your question:
shouldn't the focus of the model during its training to maximize acc (or minimize 1/acc) instead of minimizing MSE?
is indeed valid, at least in principle (save for the reference to MSE), but only for classification problems, where, roughly speaking, the situation is as follows: we cannot use the vast arsenal of convex optimization methods in order to directly maximize the accuracy, because accuracy is not a differentiable function; so, we need a proxy differentiable function to use as loss. The most common example of such a loss function suitable for classification problems is the cross entropy.
Rather unsurprisingly, this question of yours pops up from time to time, albeit in slight variations in context; see for example own answers in
- Cost function training target versus accuracy desired goal
- Targeting a specific metric to optimize in tensorflow
For the interplay between loss and accuracy in the special case of binary classification, you may find my answers in the following threads useful:
- Loss & accuracy - Are these reasonable learning curves?
- How does Keras evaluate the accuracy?
Accuracy is not differentiable. So it cannot be a loss function. It can only work as a metric.
if i understood it correctly, you question is: why optimise "loss" when we can optimise "accuracy".
short answer:
Of course you can !! (whether it will be good for convergence is another issue). You see, both loss (MSE in your case) and accuracy are essentially usual functions or to be precise equations and you can choose any equation as your objective function.
maybe this confusion arises due to the use of things like: "mse" and even more confusing: "acc"
.
check this file to get a more clear picture of what happens when you write "mse"
"acc"
is a little bit more confusing. You see, when you write "acc" it has multiple meaning for Keras. Hence, based on what loss function you are using, Keras then decides the best "acc" function for you. Check this file to see what happens when you write "acc"
Finally, answering your question: shouldn't the focus of the model during it's training to maximize acc (or minimize 1/acc) instead of minimizing MSE?
Well, to keras, MSE
and acc
are nothing but functions. Keras optimises your model based on feedback from the function defined at:
model.compile(loss=<function_to_take_feedback_from>, optimizer=<also_another_function>, metrics=<function_to_just_evaluate_and_print_result_hoping_this_printed_value_means_something_to_you_the_user>)
summarising:
for attribute: loss
pass a function. If you do not want to do so, just write "mse"
and keras will pass the required function for you.
for attribute: metrics
pass a list of function(s). If you are lazy like me, then simple ask keras to do so by writing "acc"
Long Answer:
which function/equation should you use as your objective function?
that's for another day :)