Why does my GradientDescentOptimizer produce NaN?

2019-07-13 21:21发布

问题:

I'm currently working on reworking Professor Andrew Ng's "Machine Learning" course assignments from Coursera, and I got stuck in the Logistic Regression portion.

filename = 'data/ex2data1.txt'
data = np.loadtxt(filename, delimiter = ",", unpack = True)

# Data matrices
xtr = np.transpose(np.array(data[:-1]))
ytr = np.transpose(np.array(data[-1:]))

# Initial weights
W = tf.Variable(tf.zeros([2,1], dtype = tf.float64))

# Bias
b = tf.Variable(tf.zeros([1], dtype = tf.float64))

# Cost function
y_ = tf.nn.sigmoid(tf.matmul(xtr,W) + b)

cost = -tf.reduce_mean(ytr*tf.log(y_) + (1-ytr)*tf.log(1-y_))
optimize = tf.train.GradientDescentOptimizer(0.01).minimize(cost)

corr = tf.equal(tf.argmax(ytr,1), tf.argmax(y_,1))
acc = tf.reduce_mean(tf.cast(corr, tf.float64))

init = tf.initialize_all_variables()

with tf.Session() as sess:
    sess.run(init)
    print(sess.run(cost))
    for _ in range(3):
        sess.run(optimize)
        print(sess.run(cost))

This produces the answer:

0.69314718056
nan
nan
nan

The first result to the cost function is correct, but the next ones are supposed to be:

3.0133
1.5207
0.7336

and instead I get a bunch of NaN's. I've tried lower learning rates, all to no avail. What am I doing wrong? And is it possible to reproduce this assignment in TensorFlow?

PS: Other python solutions seem to be using scipy.optimize but I have no idea how I would use that with TensorFlow values, and I would like to use only TensorFlow if at all possible.

EDIT: I've also tried putting bias as tf.ones instead of tf.zeros, but it also didn't work.

回答1:

Your logarithm isn't sanitizing the input. It might very well happen that you have negative input values that quickly NaN any floating number arithmetic.

What I did in Java code that makes heavy use of logs in similar domain:

  • Check for NaN or Infinity and assume output as zero
  • If negative input, clip the output to some static number eg. log(1e-5) ~= -11.51
  • otherwise just take the log

In Java that code looks like this, shouldn't be difficult to translate to tf:

public static double guardedLogarithm(double input) {
    if (Double.isNaN(input) || Double.isInfinite(input)) {
      return 0d;
    } else if (input <= 0d || input <= -0d) {
      // assume a quite low value of log(1e-5) ~= -11.51
      return -10d;
    } else {
      return FastMath.log(input);
    }
  }