I tried to train my model on Google Cloud ML using this sample code:
import keras
from keras import optimizers
from keras import losses
from keras import metrics
from keras.models import Model, Sequential
from keras.layers import Dense, Lambda, RepeatVector, TimeDistributed
import numpy as np
def test():
model = Sequential()
model.add(Dense(2, input_shape=(3,)))
model.add(RepeatVector(3))
model.add(TimeDistributed(Dense(3)))
model.compile(loss=losses.MSE,
optimizer=optimizers.RMSprop(lr=0.0001),
metrics=[metrics.categorical_accuracy],
sample_weight_mode='temporal')
x = np.random.random((1, 3))
y = np.random.random((1, 3, 3))
model.train_on_batch(x, y)
if __name__ == '__main__':
test()
and i got this error:
The replica master 0 exited with a non-zero status of 245. Termination reason: Error.
Detailed error output is big, so i'm pasting it here in pastebin
Problem is resolved. All I had to do is use tensorflow 1.1.0 instead default 1.0.1
Note this output:
And I guess the google cloud will run your code with an extra parameter called
--job-dir
. So perhaps you can try add the following code in your example code?Not 100% sure this will work but I think you can give it a try. Also I have a post here to do something similar, perhaps you can take a look there as well.