I’m using Google Dataflow 1.9 to save data into BigQuery tables. I'm looking for a way to control the table name into which a (PCollection) element is written, based on some value in that element. In our case, the elements contain a user-id, and we wish to write each to it's own user table, dynamically.
可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试):
问题:
回答1:
With 1.9.0 the only options are to either (1) partition the elements into multiple output collections, and then write each output collection to a specific table or (2) window the elements and select the destination based on the window. Option 1 will only work if there is a relatively small set of destination tables and option 2 will only work if the decision is based on the window, which won't fit your use case of per-user destinations very
If you upgrade to 2.0.0 the destination may be specified by a function that receives the window and data element, using either DynamicDestinations or a SerializableFunction. This would allow you to receive each element and then choose the destination based on the user ID.