In a streaming beam pipeline, a trigger is set to be
Window.into(FixedWindows.of(Duration.standardHours(1)))
.triggering(AfterWatermark
.pastEndOfWindow()
.withEarlyFirings(AfterProcessingTime
.pastFirstElementInPane()
.plusDelayOf(Duration.standardMinutes(15))))
.withAllowedLateness(Duration.standardHours(1))
.accumulatingFiredPanes())
If there's no new data between the early firing (15 minutes after the first element of the current window) and the watermark, will there be another firing at the end of the watermark?
If yes, under the same scenario, will there be another firing at the end of the watermark if
accumulatingFiredPanes
is changed todiscardingFiredPanes
?
Yes. There should always be a firing when the watermark passes the end of the window. The early firing panes will be marked as early, and the watermark pane will be marked as on time.
Yes, currently we always guarantee an on_time pane, which means there will be a firing at the end of the watermark.
For #2, you can set the
Window.ClosingBehavior
as second parameter towithAllowedLateness
. There are two variants:See https://beam.apache.org/releases/javadoc/2.6.0/org/apache/beam/sdk/transforms/windowing/Window.ClosingBehavior.html