可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

While running kubeflow pipeline having code that uses tensorflow 2.0. below error is displayed at end of each epoch

W tensorflow/core/kernels/data/generator_dataset_op.cc:103] Error occurred when finalizing GeneratorDataset iterator: Cancelled: Operation was cancelled

Also, after some epochs, it does not show log and shows this error

This step is in Failed state with this message: The node was low on resource: memory. Container main was using 100213872Ki, which exceeds its request of 0. Container wait was using 25056Ki, which exceeds its request of 0.

回答1:

In my case, I didn't match the batch_size and steps_per_epoch

For example,

his = Test_model.fit_generator(datagen.flow(trainrancrop_images, trainrancrop_labels, batch_size=batchsize), steps_per_epoch=len(trainrancrop_images)/batchsize, validation_data=(test_images, test_labels), epochs=1, callbacks=[callback])

batch_size in the datagen.flow must correspond to the steps_per_epoch in Test_model.fit_generator (actually, I used the wrong value on the steps_per_epoch)

This is one of the cases for the Error, I guess.

As a result, I think the problem arises when there is wrong correspondence on the batch size and steps(iterations)

Maybe the floats can be a problem when you get the step by dividing...

Check your code about this issue.

Good luck :)

回答2:

This was due to incompatible CUDA and Tensorflow versions. below versions work well with each other

tensorflow-gpu==2.0.0

tensorflow-addons==0.6.0

nvidia/cuda:10.0-cudnn7-runtime

回答3:

In my case: I installed tf-nightly. Now it's working, Though I am new to tensorflow. I followed this link

You can try.

回答4:

I have the same problem. People claimed that warming is superfluous and it has been removed in the tf-nightly, see here. But the memory leak is still there for each epoch.