Python Dependencies in Cloud DataFlow, requirement

2019-07-11 05:08发布


I am trying to get my Cloud DataFlow job running with a requirements.txt file as described here

Rather than building all of opencv from source (takes 20-30 minutes), I can just build the python library

From my compute engine, I could do this

root@fcfca6a4dad2:/DeepMeerkat# pip install opencv-python
Collecting opencv-python
  Downloading opencv_python- (6.7MB)
    100% |################################| 6.7MB 163kB/s
Collecting numpy>=1.11.1 (from opencv-python)
  Downloading numpy-1.13.0-cp27-cp27mu-manylinux1_x86_64.whl (16.6MB)
    100% |################################| 16.6MB 68kB/s
Installing collected packages: numpy, opencv-python
  Found existing installation: numpy 1.8.2
    DEPRECATION: Uninstalling a distutils installed project (numpy) has been deprecated and will be removed in a future version. This is due to the fact that uninstalling a distutils project will only partially uninstall the project.
    Uninstalling numpy-1.8.2:
      Successfully uninstalled numpy-1.8.2
Successfully installed numpy-1.13.0 opencv-python-

I could wrap this into a requirements file with a few other modules

root@fcfca6a4dad2:/DeepMeerkat# pip install -r tests/prediction/requirements.txt
Requirement already satisfied: opencv-python in /usr/local/lib/python2.7/dist-packages (from -r tests/prediction/requirements.txt (line 1))
Collecting tensorflow==1.0.1 (from -r tests/prediction/requirements.txt (line 2))
  Downloading tensorflow-1.0.1-cp27-cp27mu-manylinux1_x86_64.whl (44.1MB)
    100% |################################| 44.1MB 27kB/s
Requirement already satisfied: numpy in /usr/local/lib/python2.7/dist-packages (from -r tests/prediction/requirements.txt (line 3))
Requirement already satisfied: mock>=2.0.0 in /usr/local/lib/python2.7/dist-packages (from tensorflow==1.0.1->-r tests/prediction/requirements.txt (line 2))
Requirement already satisfied: wheel in /usr/lib/python2.7/dist-packages (from tensorflow==1.0.1->-r tests/prediction/requirements.txt (line 2))
Requirement already satisfied: six>=1.10.0 in /usr/local/lib/python2.7/dist-packages (from tensorflow==1.0.1->-r tests/prediction/requirements.txt (line 2))
Requirement already satisfied: protobuf>=3.1.0 in /usr/local/lib/python2.7/dist-packages (from tensorflow==1.0.1->-r tests/prediction/requirements.txt (line 2))
Requirement already satisfied: funcsigs>=1; python_version < "3.3" in /usr/local/lib/python2.7/dist-packages (from mock>=2.0.0->tensorflow==1.0.1->-r tests/prediction/requirements.txt (line 2))
Requirement already satisfied: pbr>=0.11 in /usr/local/lib/python2.7/dist-packages (from mock>=2.0.0->tensorflow==1.0.1->-r tests/prediction/requirements.txt (line 2))
Requirement already satisfied: setuptools in /usr/local/lib/python2.7/dist-packages (from protobuf>=3.1.0->tensorflow==1.0.1->-r tests/prediction/requirements.txt (line 2))
Installing collected packages: tensorflow
Successfully installed tensorflow-1.0.1

However, when I send it to cloud dataflow, it can't find opencv-python from the worker.

root@fcfca6a4dad2:/DeepMeerkat# python tests/prediction/ \
>     --runner DataflowRunner \
>     --project $PROJECT \
>     --staging_location $BUCKET/staging \
>     --temp_location $BUCKET/temp \
>     --job_name $PROJECT-deepmeerkat \
>     --setup_file tests/prediction/ \
>     --requirements_file tests/prediction/requirements.txt
No handlers could be found for logger "oauth2client.contrib.multistore_file"
/usr/local/lib/python2.7/dist-packages/apache_beam/io/gcp/ DeprecationWarning: object() takes no parameters
  super(GcsIO, cls).__new__(cls, storage_client))
INFO:root:Starting the size estimation of the input
INFO:oauth2client.transport:Attempting refresh to obtain initial access_token
INFO:root:Finished the size estimation of the input at 1 files. Estimation took 0.0855119228363 seconds
INFO:root:Starting the size estimation of the input
INFO:oauth2client.transport:Attempting refresh to obtain initial access_token
INFO:root:Finished the size estimation of the input at 1 files. Estimation took 0.0597159862518 seconds
/usr/local/lib/python2.7/dist-packages/apache_beam/coders/ UserWarning: Using fallback coder for typehint: Any.
  warnings.warn('Using fallback coder for typehint: %r.' % typehint)
INFO:root:Starting GCS upload to gs://api-project-773889352370-testing/staging/api-project-773889352370-deepmeerkat.1499372970.163850/requirements.txt...
INFO:oauth2client.transport:Attempting refresh to obtain initial access_token
INFO:root:Completed GCS upload to gs://api-project-773889352370-testing/staging/api-project-773889352370-deepmeerkat.1499372970.163850/requirements.txt
INFO:root:Executing command: ['/usr/bin/python', '-m', 'pip', 'install', '--download', '/tmp/dataflow-requirements-cache', '-r', 'tests/prediction/requirements.txt', '--no-binary', ':all:']
DEPRECATION: pip install --download has been deprecated and will be removed in the future. Pip now has a download command that should be used instead.
Collecting opencv-python (from -r tests/prediction/requirements.txt (line 1))
  Could not find a version that satisfies the requirement opencv-python (from -r tests/prediction/requirements.txt (line 1)) (from versions: )
No matching distribution found for opencv-python (from -r tests/prediction/requirements.txt (line 1))
Traceback (most recent call last):
  File "tests/prediction/", line 22, in <module>
  File "/DeepMeerkat/tests/prediction/modules/", line 32, in run
  File "/usr/local/lib/python2.7/dist-packages/apache_beam/", line 167, in run
    self.to_runner_api(), self.runner, self._options).run(False)
  File "/usr/local/lib/python2.7/dist-packages/apache_beam/", line 176, in run
  File "/usr/local/lib/python2.7/dist-packages/apache_beam/runners/dataflow/", line 252, in run
    self.dataflow_client.create_job(self.job), self)
  File "/usr/local/lib/python2.7/dist-packages/apache_beam/utils/", line 168, in wrapper
    return fun(*args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/apache_beam/runners/dataflow/internal/", line 425, in create_job
  File "/usr/local/lib/python2.7/dist-packages/apache_beam/runners/dataflow/internal/", line 448, in create_job_description
    job.options, file_copy=self._gcs_file_copy)
  File "/usr/local/lib/python2.7/dist-packages/apache_beam/runners/dataflow/internal/", line 307, in stage_job_resources
    setup_options.requirements_file, requirements_cache_path)
  File "/usr/local/lib/python2.7/dist-packages/apache_beam/runners/dataflow/internal/", line 241, in _populate_requirements_cache
  File "/usr/local/lib/python2.7/dist-packages/apache_beam/utils/", line 44, in check_call
    return subprocess.check_call(*args, **kwargs)
  File "/usr/lib/python2.7/", line 540, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['/usr/bin/python', '-m', 'pip', 'install', '--download', '/tmp/dataflow-requirements-cache', '-r', 'tests/prediction/requirements.txt', '--no-binary', ':all:']' returned non-zero exit status 1

It seems to be the no binary flag that is the problem. Locally running (after uninstalling the above)

root@fcfca6a4dad2:/DeepMeerkat# pip install -r tests/prediction/requirements.txt --no-binary :all:
Collecting opencv-python (from -r tests/prediction/requirements.txt (line 1))
  Could not find a version that satisfies the requirement opencv-python (from -r tests/prediction/requirements.txt (line 1)) (from versions: )
No matching distribution found for opencv-python (from -r tests/prediction/requirements.txt (line 1))

The no-binary flag is described as excluding broken wheels? How is this applicable in this case?

Can confirm that the module can be run


root@fcfca6a4dad2:/DeepMeerkat# pip install opencv-python
Collecting opencv-python
  Using cached opencv_python-
Requirement already satisfied: numpy>=1.11.1 in /usr/local/lib/python2.7/dist-packages (from opencv-python)
Installing collected packages: opencv-python
Successfully installed opencv-python-
root@fcfca6a4dad2:/DeepMeerkat# python
Python 2.7.9 (default, Jun 29 2016, 13:08:31)
[GCC 4.9.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import cv2


I think the error you are seeing is actually being caused because the worker is failing to install the wheel file. As noted on the opencv-python package page problems with wheel files may cause the package to appear as not found.

In this case, you can use the instructions for packages not-in PyPI and specify --extra_package <local path to wheel file> rather than adding opencv-python as a requirement. This should cause the wheel file to be staged and installed within each worker.