We need a non-Python dependency installed into our Dataflow process (we need an ODBC driver to access an MSSQL DB)
We've written a setup.py
that successfully installs those using the steps here: https://cloud.google.com/dataflow/pipelines/dependencies-python#non-python-dependencies
We want to keep our original setup.py
for the package (which doesn't install those extra dependencies); is there a way of using a different setup.py
for Dataflow installs?
We tried:
- calling it
setup_dataflow.py
, but Dataflow raised an error stating it needed to be calledsetup.py
. - following the steps here, and using a
setup.py
within a child path to the root path. We weren't successful at that
We could try a if
statement within setup.py
to identify whether it's being installed in a Dataflow environment (though I couldn't find any reliable environment variables to identify this)
Any advice / suggestions?
Thanks
Currently there's no convenient way to do this. You could have two different packages, something like so:
Where
dataflow_pipeline/setup.py
simply importsoriginal_package
, and adds the extra dependencies.It's not ideal, but it should work.