I want to play with Dataflow for Python SDK from a Jupyter notebook. I am not sure what are the dependencies needed and if I can spread the code over multiple notebook cells or not. What are the steps involved?
可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试):
问题:
回答1:
Yes! There are no special steps involved. For example, using a Conda environment (recommended for using IPython/Jupyter notebooks) the commands to start a Jupyter notebook are:
- conda create -n TESTENV jupyter
- source activate TESTENV
- pip install https://github.com/GoogleCloudPlatform/DataflowPythonSDK/archive/v0.2.3.tar.gz
- jupyter notebook
The commands above install version v0.2.3 of Python Dataflow. Please change it to the version desired. In the first notebook cell execute the following import statement:
import google.cloud.dataflow as df
Now you are all set. You can spread the workflow code over multiple cells. Check out the following notebook describing a very simple workflow: https://github.com/silviulica/WorkflowExamples/blob/master/notebooks/HelloWorld.ipynb