Processing with State and Timers

2019-07-25 08:08发布

Are there any guidelines or limitations for using stateful processing and timers with the Beam Dataflow runner (as of v2.1.0)? Things such as limitations on the size of state or frequency of updates etc.? The candidate streaming pipeline would use state and timers extensively for user session state, with Bigtable as durable storage.

1条回答
我命由我不由天
2楼-- · 2019-07-25 08:19

Here is some general advice for your use case

  • Please aggregate multiple elements then set a timer.
  • Please don't create a timer per element, which would be excessive.
  • Try and aggregate state, instead of accumulating large amount of state. I.e. aggregate as a sum and count, instead of storing every number when trying to compute a mean.
  • Please consider session windows for this use case.
  • In dataflow, state is not supported for merging windows. It is for beam.
  • Please use state according to your access pattern, i.e. BagState for blind writes.

Here is an informative blog post with some more info on state "Stateful processing with Apache Beam."

查看更多
登录 后发表回答