I've built an MLP with Google's TensorFlow library. The network is working but somehow it refuses to learn properly. It always converges to an output of nearly 1.0 no matter what the input actually is.
The complete code can be seen here.
Any ideas?
The input and output (batch size 4) is as follows:
input_data = [[0., 0.], [0., 1.], [1., 0.], [1., 1.]] # XOR input
output_data = [[0.], [1.], [1.], [0.]] # XOR output
n_input = tf.placeholder(tf.float32, shape=[None, 2], name="n_input")
n_output = tf.placeholder(tf.float32, shape=[None, 1], name="n_output")
Hidden layer configuration:
# hidden layer's bias neuron
b_hidden = tf.Variable(0.1, name="hidden_bias")
# hidden layer's weight matrix initialized with a uniform distribution
W_hidden = tf.Variable(tf.random_uniform([2, hidden_nodes], -1.0, 1.0), name="hidden_weights")
# calc hidden layer's activation
hidden = tf.sigmoid(tf.matmul(n_input, W_hidden) + b_hidden)
Output layer configuration:
W_output = tf.Variable(tf.random_uniform([hidden_nodes, 1], -1.0, 1.0), name="output_weights") # output layer's weight matrix
output = tf.sigmoid(tf.matmul(hidden, W_output)) # calc output layer's activation
My learning methods look like this:
loss = tf.reduce_mean(cross_entropy) # mean the cross_entropy
optimizer = tf.train.GradientDescentOptimizer(0.01) # take a gradient descent for optimizing
train = optimizer.minimize(loss) # let the optimizer train
I tried both setups for cross entropy:
cross_entropy = -tf.reduce_sum(n_output * tf.log(output))
and
cross_entropy = tf.nn.sigmoid_cross_entropy_with_logits(n_output, output)
where n_output
is the original output as described in output_data
and output
the predicted/calculated value by my network.
The training inside the for-loop (for n epochs) goes like this:
cvalues = sess.run([train, loss, W_hidden, b_hidden, W_output],
feed_dict={n_input: input_data, n_output: output_data})
I am saving the outcome to cvalues for debug printig of loss
, W_hidden
, ...
No matter what I've tried, when I test my network, trying to validate the output, it always produces something like this:
(...)
step: 2000
loss: 0.0137040186673
b_hidden: 1.3272010088
W_hidden: [[ 0.23195425 0.53248233 -0.21644847 -0.54775208 0.52298909]
[ 0.73933059 0.51440752 -0.08397482 -0.62724304 -0.53347367]]
W_output: [[ 1.65939867]
[ 0.78912479]
[ 1.4831928 ]
[ 1.28612828]
[ 1.12486529]]
(--- finished with 2000 epochs ---)
(Test input for validation:)
input: [0.0, 0.0] | output: [[ 0.99339396]]
input: [0.0, 1.0] | output: [[ 0.99289012]]
input: [1.0, 0.0] | output: [[ 0.99346077]]
input: [1.0, 1.0] | output: [[ 0.99261558]]
So it is not learning properly but always converging to nearly 1.0 no matter which input is fed.
In the meanwhile with the help of a colleague I were able to fix my solution and wanted to post it for completeness. My solution works with cross entropy and without altering the training data. Additionally it has the desired input shape of (1, 2) and ouput is scalar.
It makes use of an
AdamOptimizer
which decreases the error much faster than aGradientDescentOptimizer
. See this post for more information (& questions^^) about the optimizer.In fact, my network produces reasonably good results in only 400-800 learning steps.
After 2000 learning steps the output is nearly "perfect":
I can't comment because I don't have enough reputation but I have some questions on that answer mrry. The $L_2$ loss function makes sense because it is basically the MSE function, but why wouldn't cross-entropy work? Certainly works for other NN libs. Second of all why in the world would translating your input space from $[0,1] -> [-1,1]$ have any affect especially since you added bias vectors.
EDIT This is a solution using cross entropy and one-hot compiled from multiple sources EDIT^2 changed the code to use cross-entropy without any extra encoding or any weird target value shifting