Join Nested Structure Table using Dataflow Java co

2019-08-27 08:03发布

问题:

My objective is to join two tables, where the second table is normal and the first one is nested structure table. The join key is available inside the nested structure in first table. In this case, how to Join these two tables using dataflow java code. WithKeys (org.apache.beam.sdk.transforms.WithKeys) accepting direct column name and it does not allow like firstTable.columnname. Could some one to help to solve this case.

回答1:

If both tables are equally large consider using the CoGroupByKey transform described here. You will have to transform your data into two PCollections keyed by the proper key before this operation.

If one table is significantly smaller than the other, feeding the smaller PCollection as a side input to a ParDo over the larger PCollection as described here might be a better option.