Error when submitting the gcloud task to google cl

2019-08-21 04:45发布

问题:

I am new to the Google ML Cloud engine. I would like to post the Keras model to the cloud to train, but I always get this error:

I  master-replica-0 Running module trainer.bot.  master-replica-0
I  master-replica-0 Downloading the package: gs://zadravecm-bot/jobs/test_job4/packages/84f3c60920e885020405e1eb7afa5f509313d2a5406a1f1551a81b81993ac66c/trainer-1.0.tar.gz  master-replica-0
I  master-replica-0 Running command: gsutil -q cp gs://zadravecm-bot/jobs/test_job4/packages/84f3c60920e885020405e1eb7afa5f509313d2a5406a1f1551a81b81993ac66c/trainer-1.0.tar.gz trainer-1.0.tar.gz  master-replica-0
I  master-replica-0 Installing the package: gs://zadravecm-bot/jobs/test_job4/packages/84f3c60920e885020405e1eb7afa5f509313d2a5406a1f1551a81b81993ac66c/trainer-1.0.tar.gz  master-replica-0
I  master-replica-0 Running command: pip3 install --user --upgrade --force-reinstall --no-deps trainer-1.0.tar.gz  master-replica-0
I  master-replica-0 Processing ./trainer-1.0.tar.gz  master-replica-0
E  master-replica-0 Exception:  master-replica-0
E  master-replica-0 Traceback (most recent call last):  master-replica-0
E  master-replica-0   File "/usr/local/lib/python3.5/dist-packages/pip/_internal/basecommand.py", line 228, in main  master-replica-0
E  master-replica-0     status = self.run(options, args)  master-replica-0
E  master-replica-0   File "/usr/local/lib/python3.5/dist-packages/pip/_internal/commands/install.py", line 291, in run  master-replica-0
E  master-replica-0     resolver.resolve(requirement_set)  master-replica-0
E  master-replica-0   File "/usr/local/lib/python3.5/dist-packages/pip/_internal/resolve.py", line 103, in resolve  master-replica-0
E  master-replica-0     self._resolve_one(requirement_set, req)  master-replica-0
E  master-replica-0   File "/usr/local/lib/python3.5/dist-packages/pip/_internal/resolve.py", line 257, in _resolve_one  master-replica-0
E  master-replica-0     abstract_dist = self._get_abstract_dist_for(req_to_install)  master-replica-0
E  master-replica-0   File "/usr/local/lib/python3.5/dist-packages/pip/_internal/resolve.py", line 210, in _get_abstract_dist_for  master-replica-0
E  master-replica-0     self.require_hashes  master-replica-0
E  master-replica-0   File "/usr/local/lib/python3.5/dist-packages/pip/_internal/operations/prepare.py", line 310, in prepare_linked_requirement  master-replica-0
E  master-replica-0     progress_bar=self.progress_bar  master-replica-0
E  master-replica-0   File "/usr/local/lib/python3.5/dist-packages/pip/_internal/download.py", line 824, in unpack_url  master-replica-0
E  master-replica-0     unpack_file_url(link, location, download_dir, hashes=hashes)  master-replica-0
E  master-replica-0   File "/usr/local/lib/python3.5/dist-packages/pip/_internal/download.py", line 729, in unpack_file_url  master-replica-0
E  master-replica-0     unpack_file(from_path, location, content_type, link)  master-replica-0
E  master-replica-0   File "/usr/local/lib/python3.5/dist-packages/pip/_internal/utils/misc.py", line 578, in unpack_file  master-replica-0
E  master-replica-0     tarfile.is_tarfile(filename) or  master-replica-0
E  master-replica-0   File "/usr/lib/python3.5/tarfile.py", line 2448, in is_tarfile  master-replica-0
E  master-replica-0     t = open(name)  master-replica-0
E  master-replica-0   File "/usr/lib/python3.5/tarfile.py", line 1557, in open  master-replica-0
E  master-replica-0     return func(name, "r", fileobj, **kwargs)  master-replica-0
E  master-replica-0   File "/usr/lib/python3.5/tarfile.py", line 1629, in gzopen  master-replica-0
E  master-replica-0     t = cls.taropen(name, mode, fileobj, **kwargs)  master-replica-0
E  master-replica-0   File "/usr/lib/python3.5/tarfile.py", line 1605, in taropen  master-replica-0
E  master-replica-0     return cls(name, mode, fileobj, **kwargs)  master-replica-0
E  master-replica-0   File "/usr/lib/python3.5/tarfile.py", line 1470, in __init__  master-replica-0
E  master-replica-0     self.firstmember = self.next()  master-replica-0
E  master-replica-0   File "/usr/lib/python3.5/tarfile.py", line 2279, in next  master-replica-0
E  master-replica-0     tarinfo = self.tarinfo.fromtarfile(self)  master-replica-0
E  master-replica-0   File "/usr/lib/python3.5/tarfile.py", line 1082, in fromtarfile  master-replica-0
E  master-replica-0     buf = tarfile.fileobj.read(BLOCKSIZE)  master-replica-0
E  master-replica-0   File "/usr/lib/python3.5/gzip.py", line 274, in read  master-replica-0
E  master-replica-0     return self._buffer.read(size)  master-replica-0
E  master-replica-0   File "/usr/lib/python3.5/_compression.py", line 68, in readinto  master-replica-0
E  master-replica-0     data = self.read(len(byte_view))  master-replica-0
E  master-replica-0   File "/usr/lib/python3.5/gzip.py", line 469, in read  master-replica-0
E  master-replica-0     uncompress = self._decompressor.decompress(buf, size)  master-replica-0
E  master-replica-0 zlib.error: Error -3 while decompressing data: invalid distance too far back  master-replica-0
E  master-replica-0 You are using pip version 10.0.1, however version 18.0 is available.  master-replica-0
E  master-replica-0 You should consider upgrading via the 'pip install --upgrade pip' command.  master-replica-0
W  master-replica-0 Installation of package failed on try 1/2: Command '['pip3', 'install', '--user', '--upgrade', '--force-reinstall', '--no-deps', 'trainer-1.0.tar.gz']' returned non-zero exit status 2
Retrying ...  master-replica-0
I  master-replica-0 Running command: pip3 install --user --upgrade --force-reinstall --no-deps trainer-1.0.tar.gz  master-replica-0
I  master-replica-0 Processing ./trainer-1.0.tar.gz  master-replica-0
E  master-replica-0 Exception:  master-replica-0
E  master-replica-0 Traceback (most recent call last):  master-replica-0
E  master-replica-0   File "/usr/local/lib/python3.5/dist-packages/pip/_internal/basecommand.py", line 228, in main  master-replica-0
E  master-replica-0     status = self.run(options, args)  master-replica-0
E  master-replica-0   File "/usr/local/lib/python3.5/dist-packages/pip/_internal/commands/install.py", line 291, in run  master-replica-0
E  master-replica-0     resolver.resolve(requirement_set)  master-replica-0
E  master-replica-0   File "/usr/local/lib/python3.5/dist-packages/pip/_internal/resolve.py", line 103, in resolve  master-replica-0
E  master-replica-0     self._resolve_one(requirement_set, req)  master-replica-0
E  master-replica-0   File "/usr/local/lib/python3.5/dist-packages/pip/_internal/resolve.py", line 257, in _resolve_one  master-replica-0
E  master-replica-0     abstract_dist = self._get_abstract_dist_for(req_to_install)  master-replica-0
E  master-replica-0   File "/usr/local/lib/python3.5/dist-packages/pip/_internal/resolve.py", line 210, in _get_abstract_dist_for  master-replica-0
E  master-replica-0     self.require_hashes  master-replica-0
E  master-replica-0   File "/usr/local/lib/python3.5/dist-packages/pip/_internal/operations/prepare.py", line 310, in prepare_linked_requirement  master-replica-0
E  master-replica-0     progress_bar=self.progress_bar  master-replica-0
E  master-replica-0   File "/usr/local/lib/python3.5/dist-packages/pip/_internal/download.py", line 824, in unpack_url  master-replica-0
E  master-replica-0     unpack_file_url(link, location, download_dir, hashes=hashes)  master-replica-0
E  master-replica-0   File "/usr/local/lib/python3.5/dist-packages/pip/_internal/download.py", line 729, in unpack_file_url  master-replica-0
E  master-replica-0     unpack_file(from_path, location, content_type, link)  master-replica-0
E  master-replica-0   File "/usr/local/lib/python3.5/dist-packages/pip/_internal/utils/misc.py", line 578, in unpack_file  master-replica-0
E  master-replica-0     tarfile.is_tarfile(filename) or  master-replica-0
E  master-replica-0   File "/usr/lib/python3.5/tarfile.py", line 2448, in is_tarfile  master-replica-0
E  master-replica-0     t = open(name)  master-replica-0
E  master-replica-0   File "/usr/lib/python3.5/tarfile.py", line 1557, in open  master-replica-0
E  master-replica-0     return func(name, "r", fileobj, **kwargs)  master-replica-0
E  master-replica-0   File "/usr/lib/python3.5/tarfile.py", line 1629, in gzopen  master-replica-0
E  master-replica-0     t = cls.taropen(name, mode, fileobj, **kwargs)  master-replica-0
E  master-replica-0   File "/usr/lib/python3.5/tarfile.py", line 1605, in taropen  master-replica-0
E  master-replica-0     return cls(name, mode, fileobj, **kwargs)  master-replica-0
E  master-replica-0   File "/usr/lib/python3.5/tarfile.py", line 1470, in __init__  master-replica-0
E  master-replica-0     self.firstmember = self.next()  master-replica-0
E  master-replica-0   File "/usr/lib/python3.5/tarfile.py", line 2279, in next  master-replica-0
E  master-replica-0     tarinfo = self.tarinfo.fromtarfile(self)  master-replica-0
E  master-replica-0   File "/usr/lib/python3.5/tarfile.py", line 1082, in fromtarfile  master-replica-0
E  master-replica-0     buf = tarfile.fileobj.read(BLOCKSIZE)  master-replica-0
E  master-replica-0   File "/usr/lib/python3.5/gzip.py", line 274, in read  master-replica-0
E  master-replica-0     return self._buffer.read(size)  master-replica-0
E  master-replica-0   File "/usr/lib/python3.5/_compression.py", line 68, in readinto  master-replica-0
E  master-replica-0     data = self.read(len(byte_view))  master-replica-0
E  master-replica-0   File "/usr/lib/python3.5/gzip.py", line 469, in read  master-replica-0
E  master-replica-0     uncompress = self._decompressor.decompress(buf, size)  master-replica-0
E  master-replica-0 zlib.error: Error -3 while decompressing data: invalid distance too far back  master-replica-0
E  master-replica-0 You are using pip version 10.0.1, however version 18.0 is available.  master-replica-0
E  master-replica-0 You should consider upgrading via the 'pip install --upgrade pip' command.  master-replica-0
E  master-replica-0 Command '['pip3', 'install', '--user', '--upgrade', '--force-reinstall', '--no-deps', 'trainer-1.0.tar.gz']' returned non-zero exit status 
2  master-replica-0
I  master-replica-0 Module completed; cleaning up.  master-replica-0
I  master-replica-0 Clean up finished.  master-replica-0
E  The replica master 0 exited with a non-zero status of 2. 
I  Job failed. 

My terminal script is:

export JOB_NAME="test_job4"
export BUCKET_NAME="zadravecm-bot"
export CLOUD_CONFIG=trainer/cloudml-gpu.yaml
export JOB_DIR=gs://zadravecm-bot/jobs/$JOB_NAME
export MODULE=trainer.bot
export PACKAGE_PATH=./trainer
export REGION=us-central1
export RUNTIME=1.8

gcloud ml-engine jobs submit training $JOB_NAME \
    --job-dir $JOB_DIR \
    --runtime-version $RUNTIME \
    --module-name $MODULE \
    --package-path $PACKAGE_PATH \
    --region $REGION \
    --config $CLOUD_CONFIG

and the GPU configurations

trainingInput:
  scaleTier: BASIC_GPU
  runtimeVersion: "1.8"
  pythonVersion: "3.5"

my application hierarchy is:

Bot
|
|---> trainer
    |
    | ---> __init__.py
    | ---> bot.py
    | ---> cloudml-gpu.yaml
|
|---> setup.py