Tensorflow leaks 1280 bytes with each session open

2019-06-17 10:23发布

问题:

It seems that each Tensorflow session I open and close consumes 1280 bytes from the GPU memory, which are not released until the python kernel is terminated.

To reproduce, save the following python script as memory_test.py:

import tensorflow as tf
import sys
n_Iterations=int(sys.argv[1])
def open_and_close_session():
   with tf.Session() as sess:
      pass
for _ in range(n_Iterations):
   open_and_close_session()
with tf.Session() as sess:
   print("bytes used=",sess.run(tf.contrib.memory_stats.BytesInUse()))

Then run it from command line with different number of iterations:

  • python memory_test.py 0 yields bytes used= 1280
  • python memory_test.py 1 yields bytes used= 2560.
  • python memory_test.py 10 yields bytes used= 14080.
  • python memory_test.py 100 yields bytes used= 129280.
  • python memory_test.py 1000 yields bytes used= 1281280.

The math is easy - each session opened and closed leaks 1280 bytes. I tested this script on two different ubuntu 17.10 workstations with tensorflow-gpu 1.6 and 1.7 and different NVIDIA GPUs.

Did I miss some explicit garbage collection or is it a Tensorflow bug?

Edit: Note that unlike the case described in this question, I add nothing to the default global graph within the loop, unless the tf.Session() objects themselves 'count'. If this is the case, how can one delete them? tf.reset_default_graph() or using with tf.Graph().as_default(), tf.Session() as sess: doesn't help.

回答1:

Turning my comment into an answer:

I can reproduce this behavior. I guess you should create an Issue on the GitHub-Issue-Tracker. TF uses it own Allocator-mechanism and the documentation of the session object clearly states that close()

Calling this method frees all resources associated with the session.

Which is apparently not the case here. However, even the 1281280 bytes could be potentially reused from the memory pool in a consecutive session.

So the answer is: It seems to be a bug (even in a recent '1.8.0-rc0' Version of TensorFlow.) -- either in close() or in the memory_stats Implementation.