I am experimenting some code on Jupyter and keep getting stuck here. Things work actually fine if I remove the line starting with "optimizer = ..." and all references to this line. But if I put this line in the code, it gives an error.
I am not pasting all other functions here to keep the size of the code at a readable level. I hope someone more experienced can see it at once what is the problem here.
Note that there are 5, 4, 3, and 2 units in input layer, in 2 hidden layers, and in output layers.
CODE:
tf.reset_default_graph()
num_units_in_layers = [5,4,3,2]
X = tf.placeholder(shape=[5, 3], dtype=tf.float32)
Y = tf.placeholder(shape=[2, 3], dtype=tf.float32)
parameters = initialize_layer_parameters(num_units_in_layers)
init = tf.global_variables_initializer()
my_sess = tf.Session()
my_sess.run(init)
ZL = forward_propagation_with_relu(X, num_units_in_layers, parameters, my_sess)
#my_sess.run(parameters) # Do I need to run this? Or is it obsolete?
cost = compute_cost(ZL, Y, my_sess, parameters, batch_size=3, lambd=0.05)
optimizer = tf.train.AdamOptimizer(learning_rate = 0.001).minimize(cost)
_ , minibatch_cost = my_sess.run([optimizer, cost],
feed_dict={X: minibatch_X,
Y: minibatch_Y})
print(minibatch_cost)
my_sess.close()
ERROR:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-321-135b9fc18268> in <module>()
16 cost = compute_cost(ZL, Y, my_sess, parameters, 3, 0.05)
17
---> 18 optimizer = tf.train.AdamOptimizer(learning_rate = 0.001).minimize(cost)
19 _ , minibatch_cost = my_sess.run([optimizer, cost],
20 feed_dict={X: minibatch_X,
~/.local/lib/python3.5/site-packages/tensorflow/python/training/optimizer.py in minimize(self, loss, global_step, var_list, gate_gradients, aggregation_method, colocate_gradients_with_ops, name, grad_loss)
362 "No gradients provided for any variable, check your graph for ops"
363 " that do not support gradients, between variables %s and loss %s." %
--> 364 ([str(v) for _, v in grads_and_vars], loss))
365
366 return self.apply_gradients(grads_and_vars, global_step=global_step,
ValueError: No gradients provided for any variable, check your graph for ops that do not support gradients, between variables ["<tf.Variable 'weights/W1:0' shape=(4, 5) dtype=float32_ref>", "<tf.Variable 'biases/b1:0' shape=(4, 1) dtype=float32_ref>", "<tf.Variable 'weights/W2:0' shape=(3, 4) dtype=float32_ref>", "<tf.Variable 'biases/b2:0' shape=(3, 1) dtype=float32_ref>", "<tf.Variable 'weights/W3:0' shape=(2, 3) dtype=float32_ref>", "<tf.Variable 'biases/b3:0' shape=(2, 1) dtype=float32_ref>"] and loss Tensor("Add_3:0", shape=(), dtype=float32).
Note that if I run
print(tf.trainable_variables())
just before the "optimizer = ..." line, I actually see my trainable variables there.
hts/W1:0' shape=(4, 5) dtype=float32_ref>, <tf.Variable 'biases/b1:0' shape=(4, 1) dtype=float32_ref>, <tf.Variable 'weights/W2:0' shape=(3, 4) dtype=float32_ref>, <tf.Variable 'biases/b2:0' shape=(3, 1) dtype=float32_ref>, <tf.Variable 'weights/W3:0' shape=(2, 3) dtype=float32_ref>, <tf.Variable 'biases/b3:0' shape=(2, 1) dtype=float32_ref>]
Would anyone have an idea about what can be the problem?
EDITING and ADDING SOME MORE INFO: In case you would like to see how I create & initialize my parameters, here is the code. Maybe there is sth wrong with this part but I don't see what..
def get_nn_parameter(variable_scope, variable_name, dim1, dim2):
with tf.variable_scope(variable_scope, reuse=tf.AUTO_REUSE):
v = tf.get_variable(variable_name,
[dim1, dim2],
trainable=True,
initializer = tf.contrib.layers.xavier_initializer())
return v
def initialize_layer_parameters(num_units_in_layers):
parameters = {}
L = len(num_units_in_layers)
for i in range (1, L):
temp_weight = get_nn_parameter("weights",
"W"+str(i),
num_units_in_layers[i],
num_units_in_layers[i-1])
parameters.update({"W" + str(i) : temp_weight})
temp_bias = get_nn_parameter("biases",
"b"+str(i),
num_units_in_layers[i],
1)
parameters.update({"b" + str(i) : temp_bias})
return parameters
#
ADDENDUM
I got it working. Instead of writing a separate answer, I am adding the correct version of my code here.
(David's answer below helped a lot.)
I simply removed the my_sess as parameter to my compute_cost function. (I could not make it work previously but seemingly it is not needed at all.) And I also reordered statements in my main function to call things in the right order.
Here is the working version of my cost function and how I call it:
def compute_cost(ZL, Y, parameters, mb_size, lambd):
logits = tf.transpose(ZL)
labels = tf.transpose(Y)
cost_unregularized = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits_v2(logits = logits, labels = labels))
#Since the dict parameters includes both W and b, it needs to be divided with 2 to find L
L = len(parameters) // 2
list_sum_weights = []
for i in range (0, L):
list_sum_weights.append(tf.nn.l2_loss(parameters.get("W"+str(i+1))))
regularization_effect = tf.multiply((lambd / mb_size), tf.add_n(list_sum_weights))
cost = tf.add(cost_unregularized, regularization_effect)
return cost
And here is the main function where I call the compute_cost(..) function:
tf.reset_default_graph()
num_units_in_layers = [5,4,3,2]
X = tf.placeholder(shape=[5, 3], dtype=tf.float32)
Y = tf.placeholder(shape=[2, 3], dtype=tf.float32)
parameters = initialize_layer_parameters(num_units_in_layers)
my_sess = tf.Session()
ZL = forward_propagation_with_relu(X, num_units_in_layers, parameters)
cost = compute_cost(ZL, Y, parameters, 3, 0.05)
optimizer = tf.train.AdamOptimizer(learning_rate = 0.001).minimize(cost)
init = tf.global_variables_initializer()
my_sess.run(init)
_ , minibatch_cost = my_sess.run([optimizer, cost],
feed_dict={X: [[-1.,4.,-7.],[2.,6.,2.],[3.,3.,9.],[8.,4.,4.],[5.,3.,5.]],
Y: [[0.6, 0., 0.3], [0.4, 0., 0.7]]})
print(minibatch_cost)
my_sess.close()
I'm 99.9% sure you're creating your cost function incorrectly.
Your cost function should be a tensor. You are passing your session into the cost function, which looks like it's actually trying to run tensorflow session which is grossly in error.
Then later you're passing the result of
compute_cost
to your minimizer.This is a common misunderstanding about tensorflow.
Tensorflow is a declarative programming paradigm, that means that you first declare all the operations you want to run, then later you pass data in and run it.
Refactor your code to strictly follow this best practice:
(1) Create a
build_graph()
function, in this function all of your math operations should be placed. You should define your cost function and all layers of the network. Return theoptimize.minimize()
training op (and any other OPs you might want to get back such as accuracy).(2) Now create a session.
(3) After this point do not create any more tensorflow operations or variables, if you feel like you need to you're doing something wrong.
(4) Call sess.run on your train_op, and pass in the placeholder data via
feed_dict
.Here's a simple example of how to structure your code:
https://github.com/aymericdamien/TensorFlow-Examples/blob/master/notebooks/3_NeuralNetworks/neural_network_raw.ipynb
In general there are tremendously good examples put up by aymericdamien, I strongly recommend reviewing them to learn the basics of tensorflow.