Model not learning in tensorflow

2019-09-18 12:03发布

问题:

I am new to tensorflow and neural networks, and I am trying to create a model that just multiples two float values together.

I wasn't sure how many neurons I would want, but I picked 10 neurons and tried to see where I could go from that. I figured that would probably introduce enough complexity in order to semi-accurately learn that operation.

Anyways, here is my code:

import tensorflow as tf
import numpy as np

# Teach how to multiply
def generate_data(how_many):
    data = np.random.rand(how_many, 2)
    answers = data[:, 0] * data[:, 1]
    return data, answers


sess = tf.InteractiveSession()

# Input data
input_data = tf.placeholder(tf.float32, shape=[None, 2])
correct_answers = tf.placeholder(tf.float32, shape=[None])

# Use 10 neurons--just one layer for now, but it'll be fully connected
weights_1 = tf.Variable(tf.truncated_normal([2, 10], stddev=.1))
bias_1 = tf.Variable(.1)


# Output of this will be a [None, 10]
hidden_output = tf.nn.relu(tf.matmul(input_data, weights_1) + bias_1)

# Weights
weights_2 = tf.Variable(tf.truncated_normal([10, 1], stddev=.1))

bias_2 = tf.Variable(.1)
# Softmax them together--this will be [None, 1]
calculated_output = tf.nn.softmax(tf.matmul(hidden_output, weights_2) + bias_2)

cross_entropy = tf.reduce_mean(correct_answers * tf.log(calculated_output))

optimizer = tf.train.GradientDescentOptimizer(.5).minimize(cross_entropy)

sess.run(tf.initialize_all_variables())

for i in range(1000):
    x, y = generate_data(100)
    sess.run(optimizer, feed_dict={input_data: x, correct_answers: y})

error = tf.reduce_sum(tf.abs(calculated_output - correct_answers))

x, y = generate_data(100)
print("Total Error: ", error.eval(feed_dict={input_data: x, correct_answers: y}))

It seems that the error is always around 7522.1, which very very bad for just 100 data points, so I assume it is not learning.

My questions: Is my machine learning? If so, what can I do to make it more accurate? If not, how can I make it learn?

回答1:

There are a few major issues with the code. Aaron has already identified some of them, but there's another important one: calculated_output and correct_answers are not the same shape, so you're creating a 2D matrix when you subtract them. (The shape of calculated_output is (100, 1) and the shape of correct_answers is (100).) So you need to adjust the shape (for example, by using tf.squeeze on calculated_output).

This problem also doesn't really require any non-linearities, so you could get by with no activations and only one layer. The following code gets a total error of about 6 (~0.06 error on average for each test point). Hope that helps!

import tensorflow as tf
import numpy as np


# Teach how to multiply
def generate_data(how_many):
    data = np.random.rand(how_many, 2)
    answers = data[:, 0] * data[:, 1]
    return data, answers


sess = tf.InteractiveSession()

input_data = tf.placeholder(tf.float32, shape=[None, 2])
correct_answers = tf.placeholder(tf.float32, shape=[None])

weights_1 = tf.Variable(tf.truncated_normal([2, 1], stddev=.1))
bias_1 = tf.Variable(.0)

output_layer = tf.matmul(input_data, weights_1) + bias_1

mean_squared = tf.reduce_mean(tf.square(correct_answers - tf.squeeze(output_layer)))
optimizer = tf.train.GradientDescentOptimizer(.1).minimize(mean_squared)

sess.run(tf.initialize_all_variables())

for i in range(1000):
    x, y = generate_data(100)
    sess.run(optimizer, feed_dict={input_data: x, correct_answers: y})

error = tf.reduce_sum(tf.abs(tf.squeeze(output_layer) - correct_answers))

x, y = generate_data(100)
print("Total Error: ", error.eval(feed_dict={input_data: x, correct_answers: y}))


回答2:

The way you are using softmax is weird. Softmax is normally used when you want to have a probability distribution over a set of classes. In your code it looks like you have a one dimensional output. The softmax is not helping you there.

The cross entropy loss function is appropriate in classification problems but you are doing regression. You should try using a mean squared error loss function instead.