Epoch time increases when using for loop in pycham

2019-09-22 03:07发布

问题:

The increase in network size is not the cause(problem)

here is my code

for i in [32, 64, 128, 256, 512]:
    for j in [32, 64, 128, 256, 512]:
        for k in [32, 64, 128, 256, 512]:
            for l in [0.1, 0.2, 0.3, 0.4, 0.5]:

                model = Sequential()
                model.add(Dense(i))
                model.add(Dropout(l))

                model.add(Dense(j))
                model.add(Dropout(l))

                model.add(Dense(k))
                model.add(Dropout(l))

                model.compile(~)

                hist = model.fit(~)

                plt.savefig(str(count) + '.png')
                plt.clf()

                f = open(str(count) + '.csv', 'w')
                text = ~
                f.write(text)
                f.close()
                count+=1
                print()
                print("count :" + str(count))
                print()

I started count to 0

when count is 460~ 479 the epoch time is

Train on 7228 samples, validate on 433 samples
Epoch 1/10
 - 2254s - loss: 0.0045 - acc: 1.3835e-04 - val_loss: 0.0019 - val_acc: 0.0000e+00
Epoch 2/10
 - 86s - loss: 0.0020 - acc: 1.3835e-04 - val_loss: 0.0030 - val_acc: 0.0000e+00
Epoch 3/10
 - 85s - loss: 0.0017 - acc: 1.3835e-04 - val_loss: 0.0016 - val_acc: 0.0000e+00
Epoch 4/10
 - 86s - loss: 0.0015 - acc: 1.3835e-04 - val_loss: 1.6094e-04 - val_acc: 0.0000e+00
Epoch 5/10
 - 86s - loss: 0.0014 - acc: 1.3835e-04 - val_loss: 1.4120e-04 - val_acc: 0.0000e+00
Epoch 6/10
 - 85s - loss: 0.0013 - acc: 1.3835e-04 - val_loss: 3.8155e-04 - val_acc: 0.0000e+00
Epoch 7/10
 - 85s - loss: 0.0012 - acc: 1.3835e-04 - val_loss: 4.1694e-04 - val_acc: 0.0000e+00
Epoch 8/10
 - 85s - loss: 0.0012 - acc: 1.3835e-04 - val_loss: 4.8163e-04 - val_acc: 0.0000e+00
Epoch 9/10
 - 86s - loss: 0.0011 - acc: 1.3835e-04 - val_loss: 3.8670e-04 - val_acc: 0.0000e+00
Epoch 10/10
 - 85s - loss: 9.9018e-04 - acc: 1.3835e-04 - val_loss: 0.0016 - val_acc: 0.0000e+00

but when I restart pycharm and count is 480

epoch time is

Train on 7228 samples, validate on 433 samples
Epoch 1/10
 - 151s - loss: 0.0071 - acc: 1.3835e-04 - val_loss: 0.0018 - val_acc: 0.0000e+00
Epoch 2/10
 - 31s - loss: 0.0038 - acc: 1.3835e-04 - val_loss: 0.0014 - val_acc: 0.0000e+00
Epoch 3/10
 - 32s - loss: 0.0031 - acc: 1.3835e-04 - val_loss: 2.0248e-04 - val_acc: 0.0000e+00
Epoch 4/10
 - 32s - loss: 0.0026 - acc: 1.3835e-04 - val_loss: 3.7600e-04 - val_acc: 0.0000e+00
Epoch 5/10
 - 32s - loss: 0.0021 - acc: 1.3835e-04 - val_loss: 4.3882e-04 - val_acc: 0.0000e+00
Epoch 6/10
 - 32s - loss: 0.0020 - acc: 1.3835e-04 - val_loss: 0.0037 - val_acc: 0.0000e+00
Epoch 7/10
 - 32s - loss: 0.0021 - acc: 1.3835e-04 - val_loss: 1.2072e-04 - val_acc: 0.0000e+00
Epoch 8/10
 - 32s - loss: 0.0019 - acc: 1.3835e-04 - val_loss: 0.0031 - val_acc: 0.0000e+00
Epoch 9/10
 - 33s - loss: 0.0018 - acc: 1.3835e-04 - val_loss: 0.0051 - val_acc: 0.0000e+00
Epoch 10/10
 - 33s - loss: 0.0018 - acc: 1.3835e-04 - val_loss: 3.2728e-04 - val_acc: 0.0000e+00

I just started it again, but the epoch time was faster.

I don't know why this happened.

In the Python 3.6 version, I use tensorflow-gpu 1.13.1 version, and Cuda uses 10.0 version. OS is a Windows 10 1903 pro version and OS build uses 18362.239 Pycharm uses a 2019.1.1 community version.

I just used the for loop, and I wonder why this happened.

I changed the number of units in the for loop.

I also saved the figure with a plt.savefig, and saved the data in .csv format.

And I also ask how to solve it.

回答1:

You should use:

from keras import backend as K`
K.clear_session()

before creating the model (i.e. model=Sequential()). That's because:

Ops are not garbage collected by TF so you always add more node to the graph.

So if we don't use K.clear_session, then memory leak occurs.

Thanks to @dref360 at keras.io in Slack.