Why does my GradientDescentOptimizer produce NaN?

I'm currently working on reworking Professor Andrew Ng's "Machine Learning" course assignments from Coursera, and I got stuck in the Logistic Regression portion.

filename = 'data/ex2data1.txt'
data = np.loadtxt(filename, delimiter = ",", unpack = True)

# Data matrices
xtr = np.transpose(np.array(data[:-1]))
ytr = np.transpose(np.array(data[-1:]))

# Initial weights
W = tf.Variable(tf.zeros([2,1], dtype = tf.float64))

# Bias
b = tf.Variable(tf.zeros([1], dtype = tf.float64))

# Cost function
y_ = tf.nn.sigmoid(tf.matmul(xtr,W) + b)

cost = -tf.reduce_mean(ytr*tf.log(y_) + (1-ytr)*tf.log(1-y_))
optimize = tf.train.GradientDescentOptimizer(0.01).minimize(cost)

corr = tf.equal(tf.argmax(ytr,1), tf.argmax(y_,1))
acc = tf.reduce_mean(tf.cast(corr, tf.float64))

init = tf.initialize_all_variables()

with tf.Session() as sess:
    sess.run(init)
    print(sess.run(cost))
    for _ in range(3):
        sess.run(optimize)
        print(sess.run(cost))

This produces the answer:

0.69314718056
nan
nan
nan

The first result to the cost function is correct, but the next ones are supposed to be:

3.0133
1.5207
0.7336

and instead I get a bunch of NaN's. I've tried lower learning rates, all to no avail. What am I doing wrong? And is it possible to reproduce this assignment in TensorFlow?

PS: Other python solutions seem to be using scipy.optimize but I have no idea how I would use that with TensorFlow values, and I would like to use only TensorFlow if at all possible.

EDIT: I've also tried putting bias as tf.ones instead of tf.zeros, but it also didn't work.

Your logarithm isn't sanitizing the input. It might very well happen that you have negative input values that quickly NaN any floating number arithmetic.

What I did in Java code that makes heavy use of logs in similar domain:

Check for NaN or Infinity and assume output as zero
If negative input, clip the output to some static number eg. log(1e-5) ~= -11.51
otherwise just take the log

In Java that code looks like this, shouldn't be difficult to translate to tf:

public static double guardedLogarithm(double input) {
    if (Double.isNaN(input) || Double.isInfinite(input)) {
      return 0d;
    } else if (input <= 0d || input <= -0d) {
      // assume a quite low value of log(1e-5) ~= -11.51
      return -10d;
    } else {
      return FastMath.log(input);
    }
  }