The following test code leaks memory:
private static final float[] X = new float[]{1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9,0};
public void testTensorFlowMemory() {
// create a graph and session
try (Graph g = new Graph(); Session s = new Session(g)) {
// create a placeholder x and a const for the dimension to do a cumulative sum along
Output x = g.opBuilder("Placeholder", "x").setAttr("dtype", DataType.FLOAT).build().output(0);
Output dims = g.opBuilder("Const", "dims").setAttr("dtype", DataType.INT32).setAttr("value", Tensor.create(0)).build().output(0);
Output y = g.opBuilder("Cumsum", "y").addInput(x).addInput(dims).build().output(0);
// loop a bunch to test memory usage
for (int i=0; i<10000000; i++){
// create a tensor from X
Tensor tx = Tensor.create(X);
// run the graph and fetch the resulting y tensor
Tensor ty = s.runner().feed("x", tx).fetch("y").run().get(0);
// close the tensors to release their resources
tx.close();
ty.close();
}
System.out.println("non-threaded test finished");
}
}
Is there something obvious I'm doing wrong? The basic flow is to create a graph and a session on that graph, create a placeholder and a constant in order to do a cumulative sum on a tensor fed in as x. After running the resulting y operation, I close both the x and y tensors to free their memory resources.
Things I believe so far to help:
- This is not a Java objects memory problem. The heap does not grow, other memory in the JVM is not growing- according to jvisualvm. Doesn't appear to be a JVM memory leak according to Java's Native Memory Tracking.
- The close operations are helping, if they're not there the memory grows by leaps and bounds. With them in place it still grows pretty fast, but nearly as much as without them.
- The cumsum operator is not important, it happens with sum and other operators as well
- It happens on Mac OS with TF 1.1, and CentOS 7 with TF 1.1 and 1.2_rc0
- Commenting out the
Tensor ty
lines removes the leak, so it appears to be in there.
Any ideas? Thanks! Also, here's a Github project that demonstrates this issue with both a threaded test (to grow the memory faster) and an unthreaded test (to show it's not due to threading). It uses maven and can be run with simple:
mvn test
I believe there is indeed a leak (in particular a missing
TF_DeleteStatus
corresponding to the allocation in JNI code) (Thanks for the detailed instructions to reproduce)I'd encourage you to file an issue at http://github.com/tensorflow/tensorflow/issues and hopefully it should be fixed before the final 1.2 release.
(Relatedly, you also have a leak outside the loop since the
Tensor
object created byTensor.create(0)
is not being closed)UPDATE: This was fixed and 1.2.0-rc1 should no longer have this problem.