Error occurred when finalizing GeneratorDataset it

2020-08-25 05:25发布

While running kubeflow pipeline having code that uses tensorflow 2.0. below error is displayed at end of each epoch

W tensorflow/core/kernels/data/generator_dataset_op.cc:103] Error occurred when finalizing GeneratorDataset iterator: Cancelled: Operation was cancelled

Also, after some epochs, it does not show log and shows this error

This step is in Failed state with this message: The node was low on resource: memory. Container main was using 100213872Ki, which exceeds its request of 0. Container wait was using 25056Ki, which exceeds its request of 0.

5条回答
我只想做你的唯一
2楼-- · 2020-08-25 06:00

In my case, I didn't match the batch_size and steps_per_epoch

For example,

his = Test_model.fit_generator(datagen.flow(trainrancrop_images, trainrancrop_labels, batch_size=batchsize), steps_per_epoch=len(trainrancrop_images)/batchsize, validation_data=(test_images, test_labels), epochs=1, callbacks=[callback])

batch_size in the datagen.flow must correspond to the steps_per_epoch in Test_model.fit_generator (actually, I used the wrong value on the steps_per_epoch)

This is one of the cases for the Error, I guess.

As a result, I think the problem arises when there is wrong correspondence on the batch size and steps(iterations)

Maybe the floats can be a problem when you get the step by dividing...

Check your code about this issue.

Good luck :)

查看更多
三岁会撩人
3楼-- · 2020-08-25 06:02

In my case: I installed tf-nightly. Now it's working, Though I am new to tensorflow. I followed this link

You can try.

查看更多
唯我独甜
4楼-- · 2020-08-25 06:06

I have the same problem. People claimed that warming is superfluous and it has been removed in the tf-nightly, see here. But the memory leak is still there for each epoch.

查看更多
爷的心禁止访问
5楼-- · 2020-08-25 06:09

This was due to incompatible CUDA and Tensorflow versions. below versions work well with each other

tensorflow-gpu==2.0.0

tensorflow-addons==0.6.0

nvidia/cuda:10.0-cudnn7-runtime

查看更多
【Aperson】
6楼-- · 2020-08-25 06:09

Upgrading tensorflow from 2.1 to 2.2 fixed this issue for me. I didn't have to go to tf-nightly version.

查看更多
登录 后发表回答