At what stage does Dataflow/Apache Beam ack a pub/

2019-07-12 17:19发布

问题:

I have a dataflow streaming job with Pub/Sub subscription as an unbounded source. I want to know at what stage does dataflow acks the incoming pub/sub message. It appears to me that the message is lost if an exception is thrown during any stage of the dataflow pipeline.

Also I'd like to know how to the best practices for writing dataflow pipeline with pub/sub unbounded source for message retrieval on failure. Thank you!

回答1:

The Dataflow Streaming Runner acks pubsub messages received by a bundle after the bundle has succeeded and results of the bundle (outputs and state mutations etc) have been durably committed. Failed bundles are retried until they succeed, and don't cause data loss. If you believe that data loss may be happening, please include details (job id and your reasoning that lead you to conclude that data has been dropped because of the failures) and we'll investigate.