Is there anything in the Dataflow SDK that would allow me to stage resource files on a worker? I have specific static file resources that I need to make available on the file system for a custom DoFn that is performing NLP. My goal would is to get a zip file resource from the classloader and unzip it on the worker file system only once as the worker is being initialized rather than trying to do this in the custom DoFn.
相关问题
- Why do Dataflow steps not start?
- __call__() missing 1 required positional argument:
- Cannot upload large file to Google Cloud Storage
- How to set query parameters dialogflow php sdk
- Google Data Studio connect to cloud datastore
相关文章
- How do I create a persistent volume claim with Rea
- GKE does not scale to/from 0 when autoscaling enab
- Can't push image to google container registry
- Your application has authenticated using end user
-
Google App Engine Error:
INVALID_ARGUMENT - How to create a namespace if it doesn't exists
- How can I make http call to DialogFlow V2 using si
- Kafka to Google Cloud Platform Dataflow ingestion
You can specify
--filesToStage
to specify files that should be staged. There are several issues to be aware of:--filesToStage
to all of the files in your classpath, which ensures that the code needed to run your pipeline is available to the worker. If you override this option you'll need to make sure that it includes your code.--filesToStage=foo.zip
, the file name would befoo-<someHash>.zip
. You would need to iterate over all the files in the classpath to find the appropriate one.See the documentation on
--filesToStage
in https://cloud.google.com/dataflow/pipelines/executing-your-pipeline for some more info.