'Timely and stateful' processing possible

2019-07-29 12:10发布

问题:

I'm trying to evaluate using Apache Beam (Java SDK) (specifically for Google Cloud's Dataflow runner) for a somewhat complex state-machine workflow.

Specifically I want to take advantage of stateful processing and timers as explained in this blogpost:

https://beam.apache.org/blog/2017/08/28/timely-processing.html

Looking at the capabilities matrix page for Dataflow it says:

  • Timers: "Dataflow supports timers in non-merging windows". Ok that's fine.
  • Stateful processing:
    • "State is supported for non-merging windows". Ok fine.
    • SetState and MapState are not yet supported." Hmm...That sounds like an issue. I'm unclear what IS supported though, and if SetState and MapState are needed for the approach in the blogpost.

So my question is: can I achieve the 'timely and stateful processing' approach explained in the blogpost on Dataflow? Are the required SDK features currently supported on Dataflow or perhaps coming soon?

Thanks in advance for any help

(The blogpost says to check the capability matrix which I've done... but as I'm just starting to evaluate Beam/Dataflow I'm unable to figure out if it's possible to do 'timely and stateful processing' using Dataflow as the runner.)