It's all in the title. I'd like to run batches off the top of my streaming jobs, and being able to see the watermark as an indicator of when to start would be wonderful.
相关问题
- Why do Dataflow steps not start?
- Apache beam DataFlow runner throwing setup error
- Apply Side input to BigQueryIO.read operation in A
- Reading BigQuery federated table as source in Data
- CloudDataflow can not use “google.cloud.datastore”
相关文章
- Kafka to Google Cloud Platform Dataflow ingestion
- How to run dynamic second query in google cloud da
- Beam/Google Cloud Dataflow ReadFromPubsub Missing
- Cloud Dataflow failure recovery
- KafkaIO checkpoint - how to commit offsets to Kafk
- Validating rows before inserting into BigQuery fro
- Can Dataflow sideInput be updated per window by re
- Computing GroupBy once then passing it to multiple
You might be able to accomplish this by using pubsub to publish a signal that would trigger what ever external processing you want.
To control the frequency of that signal you could use a ParDo to filter down your records based on some criterion which might take into account the timestamps of the event.
If you explicitly want to use the watermark you could try to use windowing and triggers to produce records after the watermark passes some interval.
I don't think there is any explicit way to access the watermark.