I have a Streaming dataflow running to read the PUB/SUB subscription.
After a period of a time or may be after processing certain amount of data, i want the pipeline to stop by itself. I don't want my compute engine instance to be running indefinitely.
When i cancel the job through dataflow console, it is shown as failed job.
Is there a way to achieve this? am i missing something ? Or that feature is missing in the API.
Could you do something like this?
Pipeline pipeline = ...;
... (construct the streaming pipeline) ...
final DataflowPipelineJob job =
DataflowPipelineRunner.fromOptions(pipelineOptions)
.run(pipeline);
Thread.sleep(your timeout);
job.cancel();
I was able to drain (canceling a job without losing data) a running streaming job on data flow with Rest API.
See my answer
Use Rest Update method, with this body :
{ "requestedState": "JOB_STATE_DRAINING" }