Using google cloud dataflow PubSubIO, when does th

2019-02-20 14:20发布

问题:

Is it possible to delay acknowledgement until the subgraph (everything below the PubSubIO.Read) is successfully processed?

For example, we are streaming reads from a google pubsub subscription and then writing a file to GCS and in another branch we are writing to BigQuery using BigQueryIO.Write...

We do see that if an exception occurs it will retry indefinitely, since we are in streaming mode. However, if we cancel the job and redeploy with a code change, the message is not reprocessed.

回答1:

The acknowledgement will be made once the message is durable persisted somewhere in the Dataflow pipeline. If you want to make changes to a pipeline without losing in-flight data, use the Update feature instead of Cancel: https://cloud.google.com/dataflow/pipelines/updating-a-pipeline