I am looking to schedule the Dataflow which has PubSubIO.readString from a PubSub topic's subscripton. How can i have the job to be terminating after a configured interval? My usecase is not to keep the job running through the entire day, so looking to schedule to start, and then stop after a configured interval from within the job.
Pipeline
.apply(PubsubIO.readMessages().fromSubscription("some-subscription"))
From docs:
I would assume that you are not interested in stopping jobs manually via Console, which leaves you with the command line solution. If you intend to schedule your dataflow job to run e.g. daily, then you know at which time you want it to stop too (launch time + "configured interval"). In that case, you could configure a cron job to run the
gcloud dataflow jobs cancel
at that time every day. For instance, the following script would cancel all active jobs having been launched within the day:Another solution would be to invoke the
gcloud
command within your java code, usingRuntime.getRuntime.exec()
. You can schedule this to run after a specific interval usingjava.util.Timer().schedule()
as noted here. This way you can ensure your job is going to stop after the provided time interval regardless of when you launched it.UPDATE
@RoshanFernando correctly noted in comments that there's actually an SDK method to cancel a pipeline.