I am new to the Google ML Cloud engine. I would like to post the Keras model to the cloud to train, but I always get this error:
I master-replica-0 Running module trainer.bot. master-replica-0
I master-replica-0 Downloading the package: gs://zadravecm-bot/jobs/test_job4/packages/84f3c60920e885020405e1eb7afa5f509313d2a5406a1f1551a81b81993ac66c/trainer-1.0.tar.gz master-replica-0
I master-replica-0 Running command: gsutil -q cp gs://zadravecm-bot/jobs/test_job4/packages/84f3c60920e885020405e1eb7afa5f509313d2a5406a1f1551a81b81993ac66c/trainer-1.0.tar.gz trainer-1.0.tar.gz master-replica-0
I master-replica-0 Installing the package: gs://zadravecm-bot/jobs/test_job4/packages/84f3c60920e885020405e1eb7afa5f509313d2a5406a1f1551a81b81993ac66c/trainer-1.0.tar.gz master-replica-0
I master-replica-0 Running command: pip3 install --user --upgrade --force-reinstall --no-deps trainer-1.0.tar.gz master-replica-0
I master-replica-0 Processing ./trainer-1.0.tar.gz master-replica-0
E master-replica-0 Exception: master-replica-0
E master-replica-0 Traceback (most recent call last): master-replica-0
E master-replica-0 File "/usr/local/lib/python3.5/dist-packages/pip/_internal/basecommand.py", line 228, in main master-replica-0
E master-replica-0 status = self.run(options, args) master-replica-0
E master-replica-0 File "/usr/local/lib/python3.5/dist-packages/pip/_internal/commands/install.py", line 291, in run master-replica-0
E master-replica-0 resolver.resolve(requirement_set) master-replica-0
E master-replica-0 File "/usr/local/lib/python3.5/dist-packages/pip/_internal/resolve.py", line 103, in resolve master-replica-0
E master-replica-0 self._resolve_one(requirement_set, req) master-replica-0
E master-replica-0 File "/usr/local/lib/python3.5/dist-packages/pip/_internal/resolve.py", line 257, in _resolve_one master-replica-0
E master-replica-0 abstract_dist = self._get_abstract_dist_for(req_to_install) master-replica-0
E master-replica-0 File "/usr/local/lib/python3.5/dist-packages/pip/_internal/resolve.py", line 210, in _get_abstract_dist_for master-replica-0
E master-replica-0 self.require_hashes master-replica-0
E master-replica-0 File "/usr/local/lib/python3.5/dist-packages/pip/_internal/operations/prepare.py", line 310, in prepare_linked_requirement master-replica-0
E master-replica-0 progress_bar=self.progress_bar master-replica-0
E master-replica-0 File "/usr/local/lib/python3.5/dist-packages/pip/_internal/download.py", line 824, in unpack_url master-replica-0
E master-replica-0 unpack_file_url(link, location, download_dir, hashes=hashes) master-replica-0
E master-replica-0 File "/usr/local/lib/python3.5/dist-packages/pip/_internal/download.py", line 729, in unpack_file_url master-replica-0
E master-replica-0 unpack_file(from_path, location, content_type, link) master-replica-0
E master-replica-0 File "/usr/local/lib/python3.5/dist-packages/pip/_internal/utils/misc.py", line 578, in unpack_file master-replica-0
E master-replica-0 tarfile.is_tarfile(filename) or master-replica-0
E master-replica-0 File "/usr/lib/python3.5/tarfile.py", line 2448, in is_tarfile master-replica-0
E master-replica-0 t = open(name) master-replica-0
E master-replica-0 File "/usr/lib/python3.5/tarfile.py", line 1557, in open master-replica-0
E master-replica-0 return func(name, "r", fileobj, **kwargs) master-replica-0
E master-replica-0 File "/usr/lib/python3.5/tarfile.py", line 1629, in gzopen master-replica-0
E master-replica-0 t = cls.taropen(name, mode, fileobj, **kwargs) master-replica-0
E master-replica-0 File "/usr/lib/python3.5/tarfile.py", line 1605, in taropen master-replica-0
E master-replica-0 return cls(name, mode, fileobj, **kwargs) master-replica-0
E master-replica-0 File "/usr/lib/python3.5/tarfile.py", line 1470, in __init__ master-replica-0
E master-replica-0 self.firstmember = self.next() master-replica-0
E master-replica-0 File "/usr/lib/python3.5/tarfile.py", line 2279, in next master-replica-0
E master-replica-0 tarinfo = self.tarinfo.fromtarfile(self) master-replica-0
E master-replica-0 File "/usr/lib/python3.5/tarfile.py", line 1082, in fromtarfile master-replica-0
E master-replica-0 buf = tarfile.fileobj.read(BLOCKSIZE) master-replica-0
E master-replica-0 File "/usr/lib/python3.5/gzip.py", line 274, in read master-replica-0
E master-replica-0 return self._buffer.read(size) master-replica-0
E master-replica-0 File "/usr/lib/python3.5/_compression.py", line 68, in readinto master-replica-0
E master-replica-0 data = self.read(len(byte_view)) master-replica-0
E master-replica-0 File "/usr/lib/python3.5/gzip.py", line 469, in read master-replica-0
E master-replica-0 uncompress = self._decompressor.decompress(buf, size) master-replica-0
E master-replica-0 zlib.error: Error -3 while decompressing data: invalid distance too far back master-replica-0
E master-replica-0 You are using pip version 10.0.1, however version 18.0 is available. master-replica-0
E master-replica-0 You should consider upgrading via the 'pip install --upgrade pip' command. master-replica-0
W master-replica-0 Installation of package failed on try 1/2: Command '['pip3', 'install', '--user', '--upgrade', '--force-reinstall', '--no-deps', 'trainer-1.0.tar.gz']' returned non-zero exit status 2
Retrying ... master-replica-0
I master-replica-0 Running command: pip3 install --user --upgrade --force-reinstall --no-deps trainer-1.0.tar.gz master-replica-0
I master-replica-0 Processing ./trainer-1.0.tar.gz master-replica-0
E master-replica-0 Exception: master-replica-0
E master-replica-0 Traceback (most recent call last): master-replica-0
E master-replica-0 File "/usr/local/lib/python3.5/dist-packages/pip/_internal/basecommand.py", line 228, in main master-replica-0
E master-replica-0 status = self.run(options, args) master-replica-0
E master-replica-0 File "/usr/local/lib/python3.5/dist-packages/pip/_internal/commands/install.py", line 291, in run master-replica-0
E master-replica-0 resolver.resolve(requirement_set) master-replica-0
E master-replica-0 File "/usr/local/lib/python3.5/dist-packages/pip/_internal/resolve.py", line 103, in resolve master-replica-0
E master-replica-0 self._resolve_one(requirement_set, req) master-replica-0
E master-replica-0 File "/usr/local/lib/python3.5/dist-packages/pip/_internal/resolve.py", line 257, in _resolve_one master-replica-0
E master-replica-0 abstract_dist = self._get_abstract_dist_for(req_to_install) master-replica-0
E master-replica-0 File "/usr/local/lib/python3.5/dist-packages/pip/_internal/resolve.py", line 210, in _get_abstract_dist_for master-replica-0
E master-replica-0 self.require_hashes master-replica-0
E master-replica-0 File "/usr/local/lib/python3.5/dist-packages/pip/_internal/operations/prepare.py", line 310, in prepare_linked_requirement master-replica-0
E master-replica-0 progress_bar=self.progress_bar master-replica-0
E master-replica-0 File "/usr/local/lib/python3.5/dist-packages/pip/_internal/download.py", line 824, in unpack_url master-replica-0
E master-replica-0 unpack_file_url(link, location, download_dir, hashes=hashes) master-replica-0
E master-replica-0 File "/usr/local/lib/python3.5/dist-packages/pip/_internal/download.py", line 729, in unpack_file_url master-replica-0
E master-replica-0 unpack_file(from_path, location, content_type, link) master-replica-0
E master-replica-0 File "/usr/local/lib/python3.5/dist-packages/pip/_internal/utils/misc.py", line 578, in unpack_file master-replica-0
E master-replica-0 tarfile.is_tarfile(filename) or master-replica-0
E master-replica-0 File "/usr/lib/python3.5/tarfile.py", line 2448, in is_tarfile master-replica-0
E master-replica-0 t = open(name) master-replica-0
E master-replica-0 File "/usr/lib/python3.5/tarfile.py", line 1557, in open master-replica-0
E master-replica-0 return func(name, "r", fileobj, **kwargs) master-replica-0
E master-replica-0 File "/usr/lib/python3.5/tarfile.py", line 1629, in gzopen master-replica-0
E master-replica-0 t = cls.taropen(name, mode, fileobj, **kwargs) master-replica-0
E master-replica-0 File "/usr/lib/python3.5/tarfile.py", line 1605, in taropen master-replica-0
E master-replica-0 return cls(name, mode, fileobj, **kwargs) master-replica-0
E master-replica-0 File "/usr/lib/python3.5/tarfile.py", line 1470, in __init__ master-replica-0
E master-replica-0 self.firstmember = self.next() master-replica-0
E master-replica-0 File "/usr/lib/python3.5/tarfile.py", line 2279, in next master-replica-0
E master-replica-0 tarinfo = self.tarinfo.fromtarfile(self) master-replica-0
E master-replica-0 File "/usr/lib/python3.5/tarfile.py", line 1082, in fromtarfile master-replica-0
E master-replica-0 buf = tarfile.fileobj.read(BLOCKSIZE) master-replica-0
E master-replica-0 File "/usr/lib/python3.5/gzip.py", line 274, in read master-replica-0
E master-replica-0 return self._buffer.read(size) master-replica-0
E master-replica-0 File "/usr/lib/python3.5/_compression.py", line 68, in readinto master-replica-0
E master-replica-0 data = self.read(len(byte_view)) master-replica-0
E master-replica-0 File "/usr/lib/python3.5/gzip.py", line 469, in read master-replica-0
E master-replica-0 uncompress = self._decompressor.decompress(buf, size) master-replica-0
E master-replica-0 zlib.error: Error -3 while decompressing data: invalid distance too far back master-replica-0
E master-replica-0 You are using pip version 10.0.1, however version 18.0 is available. master-replica-0
E master-replica-0 You should consider upgrading via the 'pip install --upgrade pip' command. master-replica-0
E master-replica-0 Command '['pip3', 'install', '--user', '--upgrade', '--force-reinstall', '--no-deps', 'trainer-1.0.tar.gz']' returned non-zero exit status
2 master-replica-0
I master-replica-0 Module completed; cleaning up. master-replica-0
I master-replica-0 Clean up finished. master-replica-0
E The replica master 0 exited with a non-zero status of 2.
I Job failed.
My terminal script is:
export JOB_NAME="test_job4"
export BUCKET_NAME="zadravecm-bot"
export CLOUD_CONFIG=trainer/cloudml-gpu.yaml
export JOB_DIR=gs://zadravecm-bot/jobs/$JOB_NAME
export MODULE=trainer.bot
export PACKAGE_PATH=./trainer
export REGION=us-central1
export RUNTIME=1.8
gcloud ml-engine jobs submit training $JOB_NAME \
--job-dir $JOB_DIR \
--runtime-version $RUNTIME \
--module-name $MODULE \
--package-path $PACKAGE_PATH \
--region $REGION \
--config $CLOUD_CONFIG
and the GPU configurations
trainingInput:
scaleTier: BASIC_GPU
runtimeVersion: "1.8"
pythonVersion: "3.5"
my application hierarchy is:
Bot
|
|---> trainer
|
| ---> __init__.py
| ---> bot.py
| ---> cloudml-gpu.yaml
|
|---> setup.py