Google Cloud ML exited with a non-zero status of 2

2019-08-03 09:30发布

I tried to train my model on Google Cloud ML using this sample code:

import keras
from keras import optimizers
from keras import losses
from keras import metrics
from keras.models import Model, Sequential
from keras.layers import Dense, Lambda, RepeatVector, TimeDistributed
import numpy as np

def test():
    model = Sequential()
    model.add(Dense(2, input_shape=(3,)))
    model.add(RepeatVector(3))
    model.add(TimeDistributed(Dense(3)))
    model.compile(loss=losses.MSE,
                  optimizer=optimizers.RMSprop(lr=0.0001),
                  metrics=[metrics.categorical_accuracy],
                  sample_weight_mode='temporal')
    x = np.random.random((1, 3))
    y = np.random.random((1, 3, 3))
    model.train_on_batch(x, y)

if __name__ == '__main__':
    test()

and i got this error:

The replica master 0 exited with a non-zero status of 245. Termination reason: Error.

Detailed error output is big, so i'm pasting it here in pastebin

2条回答
闹够了就滚
2楼-- · 2019-08-03 09:49

Problem is resolved. All I had to do is use tensorflow 1.1.0 instead default 1.0.1

查看更多
不美不萌又怎样
3楼-- · 2019-08-03 10:02

Note this output:

Module raised an exception for failing to call a subprocess Command '['python', '-m', u'trainer.test', '--job-dir', u'gs://my_test_bucket_keras/s_27_100630']' returned non-zero exit status -11.

And I guess the google cloud will run your code with an extra parameter called --job-dir. So perhaps you can try add the following code in your example code?

import ...
import argparse

def test():
model = Sequential()
model.add(Dense(2, input_shape=(3,)))
model.add(RepeatVector(3))
model.add(TimeDistributed(Dense(3)))
model.compile(loss=losses.MSE,
              optimizer=optimizers.RMSprop(lr=0.0001),
              metrics=[metrics.categorical_accuracy],
              sample_weight_mode='temporal')
x = np.random.random((1, 3))
y = np.random.random((1, 3, 3))
model.train_on_batch(x, y)

if __name__ == '__main__':
    parser = argparse.ArgumentParser()
    # Input Arguments
    parser.add_argument(
      '--job-dir',
      help='GCS location to write checkpoints and export models',
      required=True
    )
    args = parser.parse_args()
    arguments = args.__dict__

    test()
    # test(**arguments) # or if you want to use this job_dir parameter in your code

Not 100% sure this will work but I think you can give it a try. Also I have a post here to do something similar, perhaps you can take a look there as well.

查看更多
登录 后发表回答