Notifying Google PubSub when Dataflow job is compl

2019-07-29 11:11发布

Is there a way to publish a message onto Google Pubsub after a Google Dataflow job completes? We have a need to notify dependent systems that the processing of incoming data is complete. How could Dataflow publish after writing data to the sink?

EDIT: We want to notify after a pipeline completes writing to GCS. Our pipeline looks like this:

 
Pipeline.create(options)
                .apply(....)
                .apply(AvroIO.Write.named("Write to GCS")
                             .withSchema(Extract.class)
                             .to(options.getOutputPath())
                             .withSuffix(".avro"));
p.run();

If we add logic outside of the pipeline.apply(...) methods we are notified when the code completes execution, not when the pipeline is completed. Ideally we could add another .apply(...) after the AvroIO sink and publish a message to PubSub.

1条回答
叼着烟拽天下
2楼-- · 2019-07-29 12:01

You have two options to get notified when your pipeline finishes, and then subsequently publish a message - or do whatever you want to after the pipeline finishes running:

  1. Use the BlockingPipelineRunner. This will run your pipeline synchronously.
  2. Use the DataflowPipelineRunner. This will run your pipeline asynchronously. You can then poll the pipeline for its status, and wait for it to finish.
查看更多
登录 后发表回答