I am trying to get a tensorflow network that does multi-label predictions. Using softmax with one-hot (single label) predictions works correctly. The accuracy get's calculated perfectly and the network learns as it should.
My basic network setup is:
X = tf.placeholder(features.dtype, (None, 300), name="input")
y = tf.placeholder(hots.dtype, (None,64), name="labels")
with tf.name_scope("dnn"):
hidden1 = fully_connected(X, 900, scope="hidden1")
hidden2 = fully_connected(hidden1, 450, scope="hidden2")
hidden3 = fully_connected(hidden2, 225, scope="hidden3")
logits = fully_connected(hidden3, max, scope="outputs", activation_fn=None)
with tf.name_scope("loss"):
xentropy = tf.nn.softmax_cross_entropy_with_logits(labels=y, logits=logits)
loss = tf.reduce_mean(xentropy, name="loss")
learning_rate = 0.05
with tf.name_scope("train"):
optimizer = tf.train.GradientDescentOptimizer(learning_rate)
training_op = optimizer.minimize(loss)
with tf.name_scope("eval"):
correct = tf.nn.in_top_k(logits, tf.argmax(y,1), 1)
accuracy = tf.reduce_mean(tf.cast(correct, tf.float32))
Because the goal is to get multi-label predictions, I changed the loss and the accuracy: (based on Tensorflow, multi label accuracy calculation)
with tf.name_scope("loss"):
xentropy = tf.nn.sigmoid_cross_entropy_with_logits(labels=tf.cast(y, tf.float32), logits=logits)
loss = tf.reduce_mean(xentropy, name="loss")
with tf.name_scope("eval"):
correct_prediction = tf.equal(tf.round(tf.nn.sigmoid(logits)), tf.round(tf.cast(y, tf.float32)))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
However, this results in a accuracy of Train accuracy: 0.984375 Test accuracy: 0.984375
for each epoch (with the single label one-hot data). It doesn't change, it's always this number.
I tested with a lot of accuracy calculations for multi-label, but can't find one that actually does give me proper results. What am I missing here?
At the end, after countless tries to get this fixed, it turned out that everything was ok, except for the optimizer. When using the
AdamOptimizer()
instead of theGradientDescentOptimizer(learning_rate)
the network started learning fast, from 0.7 to 0.97 accuracy in 40 epochs.Maybe some tweaking of the learning rate would also make it work, but for now this is finally solved!