Close window after based on element value

2019-06-06 22:51发布

问题:

Is there a way to close a window when an input element has a flag value in the side output of a DoFn? E.g. event which indicates closing of a session closes the window.

I've been reading the docs, and triggers are time based mostly. An example would be great.

Edit: Trigger.OnElementContext.forTrigger(ExecutableTrigger trigger) seems promising but ExecutableTrigger docs are pretty slim at the moment.

回答1:

I don't think that this is available. There is only one Data Driven Trigger right now, elementCountAtLeast.

https://cloud.google.com/dataflow/model/triggers#data-driven-triggers

A work around for this would be to copy the sessions window function code and write a custom window function.

https://github.com/apache/beam/blob/890bc1a23f493b042f8c2de5c042970ce5ddca96/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/windowing/Sessions.java

In short, you keep assigning elements into the same window, until you see your terminating element. Then start creating a new window.

https://github.com/apache/beam/blob/890bc1a23f493b042f8c2de5c042970ce5ddca96/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/windowing/Sessions.java#L60



回答2:

At present, there is no way to trigger off the content of an element, unfortunately. From the Apache Beam Docs:

Beam provides one data-driven trigger, AfterPane.elementCountAtLeast(). This trigger works on an element count; it fires after the current pane has collected at least N elements.

There is currently an open ticket for more robust data-driver triggers. However (again, at present), it appears that the Beam team is filling out use-cases for data-driven triggers one at a time (i.e. element count or timestamp), as opposed to adding broad-based support for triggering off arbitrary values within an element.

An ExecutableTrigger wraps a Trigger object for execution. See ExecutableTrigger docs.