After 500 steps Tensorflow fail to write summaries

2019-07-11 16:42发布

问题:

I try to train in tensorflow; I write the results to the tensorboard with this code:

   test_writer.add_summary(summary_strTest, i)
   train_writer.add_summary(summary_str, i)
   train_writer.flush()
   test_writer.flush()

After 500 steps it starts to show this error (the writing of the summaries):

tensorflow/core/util/events_writer.cc:97] Write failed because file could not be opened.
E tensorflow/core/util/events_writer.cc:63] Could not open events file: ./logs/train/events.out.tfevents.1468372504.al: Resource exhausted: ./logs/train/

I see that Resource exhausted caused because out of memory but I have more than 2GB free

And then after 100 steps when it have to write the checkpoint it crashes.

On the tensorboard server in the terminal I get this message:

WARNING:tensorflow:Found more than one graph event per run. Overwriting the graph with the newest event.

I dont know why it cant write to file after 500 steps. My logs folder, test, train have 505 file each after the run.

回答1:

So The solution is to reduce the size of the batch. I reduce it to 100 and then it write to the file.