I follow the tutorial for building kubeflow on GCP.
At the last step, after deploying the code and training with CPU.
kustomize build . |kubectl apply -f -
The distributed tensorflow encounter this issue
tensorflow.python.framework.errors_impl.NotFoundError: /tmp/tmprIn1Il/model.ckpt-1_temp_a890dac1971040119aba4921dd5f631a; No such file or directory
[[Node: save/SaveV2 = SaveV2[dtypes=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_INT64], _device="/job:ps/replica:0/task:0/device:CPU:0"](save/ShardedFilename, save/SaveV2/tensor_names, save/SaveV2/shape_and_slices, conv_layer1/conv2d/bias, conv_layer1/conv2d/kernel, conv_layer2/conv2d/bias, conv_layer2/conv2d/kernel, dense/bias, dense/kernel, dense_1/bias, dense_1/kernel, global_step)]]
I found the similar bug report but don't know how to resolve this.
From the bug report.
Just put the data on GCS and it work fine.
model.py