Google Dataflow - save the data into multiple BigQ

2019-08-19 13:07发布

问题:

I’m using Google Dataflow 1.9 to save data into BigQuery tables. I'm looking for a way to control the table name into which a (PCollection) element is written, based on some value in that element. In our case, the elements contain a user-id, and we wish to write each to it's own user table, dynamically.

回答1:

With 1.9.0 the only options are to either (1) partition the elements into multiple output collections, and then write each output collection to a specific table or (2) window the elements and select the destination based on the window. Option 1 will only work if there is a relatively small set of destination tables and option 2 will only work if the decision is based on the window, which won't fit your use case of per-user destinations very

If you upgrade to 2.0.0 the destination may be specified by a function that receives the window and data element, using either DynamicDestinations or a SerializableFunction. This would allow you to receive each element and then choose the destination based on the user ID.