I am trying to optimize a model with the following two loss functions
def loss_1(pred, weights, logits):
weighted_sparse_ce = kls.SparseCategoricalCrossentropy(from_logits=True)
policy_loss = weighted_sparse_ce(pred, logits, sample_weight=advantages)
and
def loss_2(y_pred, y):
return kls.mean_squared_error(y_pred, y)
however, because TensorFlow 2 expects loss function to be of the form
def fn(y_pred, y_true):
...
I am using a work-around for loss_1
where I pack pred
and weights
into a single tensor before passing to loss_1
in the call to model.fit
and then unpack them in loss_1
. This is inelegant and nasty because pred
and weights
are of different data types and so this requires an additional cast, pack, un-pack and un-cast each time I call model.fit
.
Furthermore, I am aware of the sample_weight
argument to fit
, which is kind of like the solution to this question. This might be a workable solution were it not for the fact that I am using two loss functions and I only want the sample_weight
applied to one of them. Also, even if this were a solution, would it not be generalizable to other types of custom loss functions.
All that being said, my question, said concisely, is:
What is the best way to create a loss function with an arbitrary number of
arguments in TensorFlow 2?
Another thing I have tried is passing a tf.tuple
but that also seems to violate TensorFlow's desires for a loss function input.
This problem can be easily solved using custom training in TF2. You need only compute your two-component loss function within a GradientTape
context and then call an optimizer with the produced gradients. For example, you could create a function custom_loss
which computes both losses given the arguments to each:
def custom_loss(model, loss1_args, loss2_args):
# model: tf.model.Keras
# loss1_args: arguments to loss_1, as tuple.
# loss2_args: arguments to loss_2, as tuple.
with tf.GradientTape() as tape:
l1_value = loss_1(*loss1_args)
l2_value = loss_2(*loss2_args)
loss_value = [l1_value, l2_value]
return loss_value, tape.gradient(loss_value, model.trainable_variables)
# In training loop:
loss_values, grads = custom_loss(model, loss1_args, loss2_args)
optimizer.apply_gradients(zip(grads, model.trainable_variables))
In this way, each loss function can take an arbitrary number of eager tensors, regardless of whether they are inputs or outputs to the model. The sets of arguments to each loss function need not be disjoint as shown in this example.
In tf 1.x we have tf.nn.weighted_cross_entropy_with_logits
function which allows us trade off recall and precision by adding extra positive weights for each class. In multi-label classification, it should be a (N,) tensor or numpy array. However, in tf 2.0, I haven't found similar loss functions yet, so I wrote my own loss function with extra arguments pos_w_arr
.
from tensorflow.keras.backend import epsilon
def pos_w_loss(pos_w_arr):
"""
Define positive weighted loss function
"""
def fn(y_true, y_pred):
_epsilon = tf.convert_to_tensor(epsilon(), dtype=y_pred.dtype.base_dtype)
_y_pred = tf.clip_by_value(y_pred, _epsilon, 1. - _epsilon)
cost = tf.multiply(tf.multiply(y_true, tf.math.log(
_y_pred)), pos_w_arr)+tf.multiply((1-y_true), tf.math.log(1-_y_pred))
return -tf.reduce_mean(cost)
return fn
Not sure what do you mean it wouldn't work when using eager tensors or numpy array as inputs though. Please correct me if I'm wrong.