I was just trying some stuff for a quaternionic neural network when I realized that, even if I close my current Session in a for loop, my program slows down massively and I get a memory leak caused by ops being constructed. This is my code:
for step in xrange(0,200):#num_epochs * train_size // BATCH_SIZE):
338
339 with tf.Session() as sess:
340
341 offset = (BATCH_SIZE) % train_size
342 #print "Offset : %d" % offset
343
344 batch_data = []
345 batch_labels = []
346 batch_data.append(qtrain[0][offset:(offset + BATCH_SIZE)])
347 batch_labels.append(qtrain_labels[0][offset:(offset + BATCH_SIZE)]
352 retour = sess.run(test, feed_dict={x: batch_data})
357
358 test2 = feedForwardStep(retour, W_to_output,b_output)
367 #sess.close()
The problem seems to come from test2 = feedForward(..)
. I need to declare these ops after executing retour
once, because retour
can't be a placeholder (I need to iterate through it). Without this line, the program runs very well, fast and without a memory leak. I can't understand why it seems like TensorFlow is trying to "save" test2
even if I close the session ...
TL;DR: Closing a session does not free the
tf.Graph
data structure in your Python program, and if each iteration of the loop adds nodes to the graph, you'll have a leak.Since your function
feedForwardStep
creates new TensorFlow operations, and you call it within thefor
loop, then there is a leak in your code—albeit a subtle one.Unless you specify otherwise (using a
with tf.Graph().as_default():
block), all TensorFlow operations are added to a global default graph. This means that every call totf.constant()
,tf.matmul()
,tf.Variable()
etc. adds objects to a global data structure. There are two ways to avoid this:Structure your program so that you build the graph once, then use
tf.placeholder()
ops to feed in different values in each iteration. You mention in your question that this might not be possible.Explicitly create a new graph in each for loop. This might be necessary if the structure of the graph depends on the data available in the current iteration. You would do this as follows:
Note that in this version, you cannot use
Tensor
orOperation
objects from a previous iteration. (For example, it's not clear from your code snippet wheretest
comes from.)