I write custom sink with python sdk. I try to store data to AWS S3. To connect S3, some credential, secret key, is necessary, but it's not good to set in code for security reason. I would like to make the environment variables reach Dataflow workers as environment variables. How can I do it?
可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试):
问题:
回答1:
Generally, for transmitting information to workers that you don't want to hard-code, you should use PipelineOptions
- please see Creating Custom Options. Then, when constructing the pipeline, just extract the parameters from your PipelineOptions
object and put them into your transform (e.g. into your DoFn
or a sink).
However, for something as sensitive as a credential, passing sensitive information in a command-line argument might be not a great idea. I would recommend a more secure approach: put the credential into a file on GCS, and pass the name of the file as a PipelineOption
. Then programmatically read the file from GCS whenever you need the credential, using GcsIO.