Idiomatic way to join on “secondary” keys

2019-08-18 03:22发布

问题:

If we have a stream that looks like this

Person {
     …
     OrganizationID
}

that we want to join with another stream

Organization {
     ID
     …
}

to create a composite record like so:

Person {
     …
     Organization {
           ID
           …
     }
}

What is the most idiomatic and efficient way to do so in the Apache Beam programming model?

NB: have seen side inputs recommended as a solution to similar problems like this, but it is not applicable here since the effect we are after is that every change to either Person or Organization should yield a new augmented Person-record.

回答1:

Edit:

The answer is, your example is not supported by Apache Beam due to missing retraction in Apache Beam implementation.

===================================================

Original answer:

You might want to check Join library[1] in Apache Beam.

Join in Beam model needs extra thinking on windowing strategies on your streams. Sounds like your streams does not require windowing, so say your streams are both in global window. But if you set global window on both of your streams, use default trigger and do Join like Beam's Join library, due to watermark never passes endless window, your Join will not emit any result. If you set repeatly data driven trigger (fire once seen enough elements), however, due to missing supporting for retraction in Beam, it's not clear how pre-emited result is refined for Join.

[1] https://github.com/apache/beam/blob/master/sdks/java/extensions/join-library/src/main/java/org/apache/beam/sdk/extensions/joinlibrary/Join.java#L49



标签: apache-beam