I am new to tensorflow and neural networks, and I am trying to create a model that just multiples two float values together.
I wasn't sure how many neurons I would want, but I picked 10 neurons and tried to see where I could go from that. I figured that would probably introduce enough complexity in order to semi-accurately learn that operation.
Anyways, here is my code:
import tensorflow as tf
import numpy as np
# Teach how to multiply
def generate_data(how_many):
data = np.random.rand(how_many, 2)
answers = data[:, 0] * data[:, 1]
return data, answers
sess = tf.InteractiveSession()
# Input data
input_data = tf.placeholder(tf.float32, shape=[None, 2])
correct_answers = tf.placeholder(tf.float32, shape=[None])
# Use 10 neurons--just one layer for now, but it'll be fully connected
weights_1 = tf.Variable(tf.truncated_normal([2, 10], stddev=.1))
bias_1 = tf.Variable(.1)
# Output of this will be a [None, 10]
hidden_output = tf.nn.relu(tf.matmul(input_data, weights_1) + bias_1)
# Weights
weights_2 = tf.Variable(tf.truncated_normal([10, 1], stddev=.1))
bias_2 = tf.Variable(.1)
# Softmax them together--this will be [None, 1]
calculated_output = tf.nn.softmax(tf.matmul(hidden_output, weights_2) + bias_2)
cross_entropy = tf.reduce_mean(correct_answers * tf.log(calculated_output))
optimizer = tf.train.GradientDescentOptimizer(.5).minimize(cross_entropy)
sess.run(tf.initialize_all_variables())
for i in range(1000):
x, y = generate_data(100)
sess.run(optimizer, feed_dict={input_data: x, correct_answers: y})
error = tf.reduce_sum(tf.abs(calculated_output - correct_answers))
x, y = generate_data(100)
print("Total Error: ", error.eval(feed_dict={input_data: x, correct_answers: y}))
It seems that the error is always around 7522.1, which very very bad for just 100 data points, so I assume it is not learning.
My questions: Is my machine learning? If so, what can I do to make it more accurate? If not, how can I make it learn?
There are a few major issues with the code. Aaron has already identified some of them, but there's another important one: calculated_output
and correct_answers
are not the same shape, so you're creating a 2D matrix when you subtract them. (The shape of calculated_output
is (100, 1) and the shape of correct_answers
is (100).) So you need to adjust the shape (for example, by using tf.squeeze
on calculated_output
).
This problem also doesn't really require any non-linearities, so you could get by with no activations and only one layer. The following code gets a total error of about 6 (~0.06 error on average for each test point). Hope that helps!
import tensorflow as tf
import numpy as np
# Teach how to multiply
def generate_data(how_many):
data = np.random.rand(how_many, 2)
answers = data[:, 0] * data[:, 1]
return data, answers
sess = tf.InteractiveSession()
input_data = tf.placeholder(tf.float32, shape=[None, 2])
correct_answers = tf.placeholder(tf.float32, shape=[None])
weights_1 = tf.Variable(tf.truncated_normal([2, 1], stddev=.1))
bias_1 = tf.Variable(.0)
output_layer = tf.matmul(input_data, weights_1) + bias_1
mean_squared = tf.reduce_mean(tf.square(correct_answers - tf.squeeze(output_layer)))
optimizer = tf.train.GradientDescentOptimizer(.1).minimize(mean_squared)
sess.run(tf.initialize_all_variables())
for i in range(1000):
x, y = generate_data(100)
sess.run(optimizer, feed_dict={input_data: x, correct_answers: y})
error = tf.reduce_sum(tf.abs(tf.squeeze(output_layer) - correct_answers))
x, y = generate_data(100)
print("Total Error: ", error.eval(feed_dict={input_data: x, correct_answers: y}))
The way you are using softmax is weird. Softmax is normally used when you want to have a probability distribution over a set of classes. In your code it looks like you have a one dimensional output. The softmax is not helping you there.
The cross entropy loss function is appropriate in classification problems but you are doing regression. You should try using a mean squared error loss function instead.