Multitask deep learning with Tensorflow

Have someone tried doing multitask deep learning with TensorFlow? That is, sharing the bottom layers while not sharing the top layers. An example with simple illustration would help a lot.

There is an similar question here, the answer used keras.

It's similar when just using tensorflow. The idea is this: we can define multiple outputs of a network, and thus multiple loss functions (objectives). We then tell optimizer to minimize a combined loss function, usually using a linear combination.

A concept diagram

This diagram is drawn according to this paper.

Let's say we are training a classifier that predict the digit in the image, with maximum 5 digits per image. Here we defined 6 output layer: digit1, digit2, digit3, digit4, digit5, length. The digit layer should output 0~9 if there is such a digit, or X(substitute it with an real number in practice) if there isn't any digit in its position. Same thing for length, it should output 0~5 if the image contains 0~5 digit, or X if it contains more than 5 digits.

Now to train it, we just add up all the cross entropy loss of each softmax function:

# Define loss and optimizer
lossLength = tf.log(tf.clip_by_value(tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(length_logits, true_length)), 1e-37, 1e+37))
lossDigit1 = tf.log(tf.clip_by_value(tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(digit1_logits, true_digit1)), 1e-37, 1e+37))
lossDigit2 = tf.log(tf.clip_by_value(tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(digit2_logits, true_digit2)), 1e-37, 1e+37))
lossDigit3 = tf.log(tf.clip_by_value(tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(digit3_logits, true_digit3)), 1e-37, 1e+37))
lossDigit4 = tf.log(tf.clip_by_value(tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(digit4_logits, true_digit4)), 1e-37, 1e+37))
lossDigit5 = tf.log(tf.clip_by_value(tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(digit5_logits, true_digit5)), 1e-37, 1e+37))

cost = tf.add(
        tf.add(
        tf.add(
        tf.add(
        tf.add(cL,lossDigit1),
        lossDigit2),
        lossDigit3),
        lossDigit4),
        lossDigit5)

optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(cost)