I would like to start a Dataproc job in response to log files arriving in GCS bucket. I also do not want to keep a persistent cluster running as new log files arrive only several times a day and it would be idle most of the time.
相关问题
- Downloading files from Google Storage using Spark
- Pyspark application only partly exploits dataproc
- GCP Dataproc custom image Python environment
- Can't create a Python 3 notebook in jupyter no
- Sqoop on Hadoop: NoSuchMethodError: com.google.com
相关文章
- Why does Spark running in Google Dataproc store te
- Spark UI appears with wrong format (broken CSS)
- Error when running python map reduce job using Had
- pyspark rdd isCheckPointed() is false
- Google Cloud Dataproc Virus CrytalMiner (dr.who)
- PySpark print to console
- Running app jar file on spark-submit in a google d
- Spark - Adding JDBC Driver JAR to Google Dataproc
I can use WorkflowTemplate API to manage the cluster lifecycle for me. With Dataproc Workflows I don't have to poll for either cluster to be created, or job created, or do any error handling.
Here's my Cloud Function. Set to
Cloud Storage bucket
to trigger onFinalize/Create
event:index.js
:Make sure to set Function to Execute to
startWorkflow
.package.json