How to setup pytorch in google-cloud-ml

2019-02-28 10:49发布

问题:

I try to throw job with Pytorch code in google-cloud-ml. so I code the "setup.py" file. And add option "install_requires"

"setup.py"

from setuptools import find_packages
from setuptools import setup
REQUIRED_PACKAGES = ['http://download.pytorch.org/whl/cpu/torch-0.3.0.post4-cp27-cp27mu-linux_x86_64.whl','torchvision']

setup(
    name='trainer',
    version='0.1',
    install_requires=REQUIRED_PACKAGES,
    packages=find_packages(),
    include_package_data=True,
    description='My keras trainer application package.'
)

and throw the job to the google-cloud-ml, but it doesn't work

with error message

{
 insertId:  "3m78xtf9czd0u"  
 jsonPayload: {
  created:  1516845879.49039   
  levelname:  "ERROR"   
  lineno:  829   
  message:  "Command '['pip', 'install', '--user', '--upgrade', '--force-reinstall', '--no-deps', u'trainer-0.1.tar.gz']' returned non-zero exit status 1"   
  pathname:  "/runcloudml.py"   
 }
 labels: {
  compute.googleapis.com/resource_id:  "6637909247101536087"   
  compute.googleapis.com/resource_name:  "cmle-training-master-5502b52646-0-ql9ds"   
  compute.googleapis.com/zone:  "us-central1-c"   
  ml.googleapis.com/job_id:  "run_ml_engine_pytorch_test_20180125_015752"   
  ml.googleapis.com/job_id/log_area:  "root"   
  ml.googleapis.com/task_name:  "master-replica-0"   
  ml.googleapis.com/trial_id:  ""   
 }
 logName:  "projects/exem-191100/logs/master-replica-0"  
 receiveTimestamp:  "2018-01-25T02:04:55.421517460Z"  
 resource: {
  labels: {…}   
  type:  "ml_job"   
 }
 severity:  "ERROR"  
 timestamp:  "2018-01-25T02:04:39.490387916Z"  
}

====================================================================

See detailed message here

so how can i use pytorch in google cloud ml engine?

回答1:

i find solution about setting up PYTORCH in google-cloud-ml

first you have to get a .whl file about pytorch and store it to google storage bucket. and you will get the link for bucket link.

gs://bucketname/directory/torch-0.3.0.post4-cp27-cp27mu-linux_x86_64.whl

the .whl file is depend on your python version or cuda version....

second you write the command line and setup.py because you have to set up the google-cloud-ml setting. related link is this submit_job_to_ml-engine you write the setup.py file to describe your setup. the related link is this write_setup.py_file

this is my command code and setup.py file

===================================================================== "command"

#commandline code
JOB_NAME="run_ml_engine_pytorch_test_$(date +%Y%m%d_%H%M%S)"
REGION=us-central1
OUTPUT_PATH=gs://yourbucket
gcloud ml-engine jobs submit training $JOB_NAME \
    --job-dir $OUTPUT_PATH \
    --runtime-version 1.4 \
    --module-name models.pytorch_test \
    --package-path models/ \
    --packages gs://yourbucket/directory/torch-0.3.0.post4-cp27-cp27mu-linux_x86_64.whl \
    --region $REGION \
    -- \
    --verbosity DEBUG

===================================================================== "setup.py"

from setuptools import find_packages
from setuptools import setup
REQUIRED_PACKAGES = ['torchvision']
setup(
    name='trainer',
    version='0.1',
    install_requires=REQUIRED_PACKAGES,
    packages=find_packages(),
    include_package_data=True,
    description='My pytorch trainer application package.'
)

=====================================================================

third if you have experience submitting job to the ml-engine. you might know the file structure about submitting ml-engine packaging_training_model. you have to follow above link and know how to pack files.



回答2:

The actual error message is a bit buried, but it is this:

'install_requires' must be a string or list of strings containing valid project/version requirement specifiers; Invalid requirement, parse error at "'://downl'"

To use packages not hosted on PyPI, you need to use dependency_links (see this documentation). Something like this ought to work:

from setuptools import find_packages
from setuptools import setup

REQUIRED_PACKAGES = ['torchvision']
DEPENDENCY_LINKS = ['http://download.pytorch.org/whl/cpu/torch-0.3.0.post4-cp27-cp27mu-linux_x86_64.whl']

setup(
    name='trainer',
    version='0.1',
    install_requires=REQUIRED_PACKAGES,
    dependency_links=DEPENDENCY_LINKS,
    packages=find_packages(),
    include_package_data=True,
    description='My keras trainer application package.'
)