Google Dataflow non-python dependencies - separate

2019-08-26 02:52发布

We need a non-Python dependency installed into our Dataflow process (we need an ODBC driver to access an MSSQL DB)

We've written a setup.py that successfully installs those using the steps here: https://cloud.google.com/dataflow/pipelines/dependencies-python#non-python-dependencies

We want to keep our original setup.py for the package (which doesn't install those extra dependencies); is there a way of using a different setup.py for Dataflow installs?

We tried:

  • calling it setup_dataflow.py, but Dataflow raised an error stating it needed to be called setup.py.
  • following the steps here, and using a setup.py within a child path to the root path. We weren't successful at that

We could try a if statement within setup.py to identify whether it's being installed in a Dataflow environment (though I couldn't find any reliable environment variables to identify this)

Any advice / suggestions?

Thanks

1条回答
Melony?
2楼-- · 2019-08-26 03:33

Currently there's no convenient way to do this. You could have two different packages, something like so:

+- dataflow_pipeline
++- setup.py
+- original_pipeline
++- setup.py
++- pipeline.py

Where dataflow_pipeline/setup.py simply imports original_package, and adds the extra dependencies.

It's not ideal, but it should work.

查看更多
登录 后发表回答